2025-12-04T08:55:53.4835536Z Current runner version: '2.330.0' 2025-12-04T08:55:53.4841316Z Runner name: 'i-0e5520d20214059b0' 2025-12-04T08:55:53.4842149Z Runner group name: 'Default' 2025-12-04T08:55:53.4843083Z Machine name: 'ip-10-1-34-86' 2025-12-04T08:55:53.4845990Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T08:55:53.4847981Z Contents: read 2025-12-04T08:55:53.4848480Z Metadata: read 2025-12-04T08:55:53.4848934Z ##[endgroup] 2025-12-04T08:55:53.4850903Z Secret source: Actions 2025-12-04T08:55:53.4851605Z Prepare workflow directory 2025-12-04T08:55:53.5329470Z Prepare all required actions 2025-12-04T08:55:53.5363211Z Getting action download info 2025-12-04T08:55:53.9157077Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T08:55:56.3553774Z Download action repository 'pytorch/pytorch@main' (SHA:eabb7ad2128580ef674446027b95bcf4e21e8df3) 2025-12-04T08:56:12.9315896Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-12-04T08:56:13.2910361Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T08:56:13.5227300Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T08:56:13.6948496Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T08:56:13.9285517Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T08:56:14.2146240Z Getting action download info 2025-12-04T08:56:14.3381694Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T08:56:14.6645041Z Getting action download info 2025-12-04T08:56:14.7961093Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T08:56:15.0462376Z Getting action download info 2025-12-04T08:56:15.1743336Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-12-04T08:56:15.3900124Z Getting action download info 2025-12-04T08:56:15.5658901Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T08:56:15.5662879Z ##[group] Inputs 2025-12-04T08:56:15.5663177Z build-environment: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T08:56:15.5670343Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T08:56:15.5677831Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:56:15.5678443Z sync-tag: 2025-12-04T08:56:15.5679121Z timeout-minutes: 360 2025-12-04T08:56:15.5679321Z use-gha: 2025-12-04T08:56:15.5679481Z dashboard-tag: 2025-12-04T08:56:15.5679669Z s3-bucket: gha-artifacts 2025-12-04T08:56:15.5679866Z aws-role-to-assume: 2025-12-04T08:56:15.5680469Z disable-monitor: false 2025-12-04T08:56:15.5680726Z monitor-log-interval: 5 2025-12-04T08:56:15.5681025Z monitor-data-collect-interval: 1 2025-12-04T08:56:15.5681386Z ##[endgroup] 2025-12-04T08:56:15.5682000Z Complete job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T08:56:15.6278825Z A job started hook has been configured by the self-hosted runner administrator 2025-12-04T08:56:15.6374481Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-12-04T08:56:15.6384328Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:15.6384888Z ##[endgroup] 2025-12-04T08:56:16.9129657Z Runner Type: lf.linux.g6.4xlarge.experimental.nvidia.gpu 2025-12-04T08:56:16.9130138Z Instance Type: g6.4xlarge 2025-12-04T08:56:16.9130346Z AMI Name: unknown 2025-12-04T08:56:16.9172640Z AMI ID: ami-08982f1c5bf93d976 2025-12-04T08:56:21.6666630Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-12-04T08:56:21.6666997Z with: 2025-12-04T08:56:21.6667479Z github-secret: *** 2025-12-04T08:56:21.6668009Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-12-04T08:56:21.6668551Z activate-with-label: false 2025-12-04T08:56:21.6668753Z label: with-ssh 2025-12-04T08:56:21.6668932Z remove-existing-keys: true 2025-12-04T08:56:21.6669134Z fail-silently: true 2025-12-04T08:56:21.6669341Z env: 2025-12-04T08:56:21.6669491Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:21.6669681Z ##[endgroup] 2025-12-04T08:56:21.7719980Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-12-04T08:56:21.7721864Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-12-04T08:56:21.7849926Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T08:56:21.7850463Z with: 2025-12-04T08:56:21.7850642Z no-sudo: true 2025-12-04T08:56:21.7850813Z submodules: recursive 2025-12-04T08:56:21.7851011Z fetch-depth: 0 2025-12-04T08:56:21.7851182Z env: 2025-12-04T08:56:21.7851328Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:21.7851516Z ##[endgroup] 2025-12-04T08:56:21.7914825Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:56:21.7915557Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:56:21.7927068Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:21.7927358Z env: 2025-12-04T08:56:21.7927548Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:21.7927775Z ##[endgroup] 2025-12-04T08:56:21.8003580Z ##[group]Run # Use all available CPUs for fetching 2025-12-04T08:56:21.8003924Z # Use all available CPUs for fetching 2025-12-04T08:56:21.8004184Z cd "${GITHUB_WORKSPACE}" 2025-12-04T08:56:21.8004432Z git config --global fetch.parallel 0 2025-12-04T08:56:21.8004717Z git config --global submodule.fetchJobs 0 2025-12-04T08:56:21.8026067Z  2025-12-04T08:56:21.8026565Z # Clean workspace. The default checkout action should also do this, but 2025-12-04T08:56:21.8027148Z # do it here as well just in case 2025-12-04T08:56:21.8027565Z if [[ -d .git ]]; then 2025-12-04T08:56:21.8027939Z  if [ -z "${NO_SUDO}" ]; then 2025-12-04T08:56:21.8028396Z  sudo git clean -ffdx 2025-12-04T08:56:21.8028758Z  else 2025-12-04T08:56:21.8029041Z  git clean -ffdx 2025-12-04T08:56:21.8029375Z  fi 2025-12-04T08:56:21.8029654Z fi 2025-12-04T08:56:21.8039380Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:21.8039684Z env: 2025-12-04T08:56:21.8039999Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:21.8040219Z NO_SUDO: true 2025-12-04T08:56:21.8040388Z ##[endgroup] 2025-12-04T08:56:21.8152953Z ##[group]Run actions/checkout@v4 2025-12-04T08:56:21.8153177Z with: 2025-12-04T08:56:21.8153363Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:56:21.8153615Z fetch-depth: 0 2025-12-04T08:56:21.8153788Z submodules: recursive 2025-12-04T08:56:21.8153978Z show-progress: false 2025-12-04T08:56:21.8154171Z repository: pytorch/pytorch 2025-12-04T08:56:21.8154488Z token: *** 2025-12-04T08:56:21.8154651Z ssh-strict: true 2025-12-04T08:56:21.8154819Z ssh-user: git 2025-12-04T08:56:21.8154993Z persist-credentials: true 2025-12-04T08:56:21.8155182Z clean: true 2025-12-04T08:56:21.8155370Z sparse-checkout-cone-mode: true 2025-12-04T08:56:21.8155589Z fetch-tags: false 2025-12-04T08:56:21.8155746Z lfs: false 2025-12-04T08:56:21.8155909Z set-safe-directory: true 2025-12-04T08:56:21.8156108Z env: 2025-12-04T08:56:21.8156270Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:21.8156452Z ##[endgroup] 2025-12-04T08:56:21.9177593Z Syncing repository: pytorch/pytorch 2025-12-04T08:56:21.9178801Z ##[group]Getting Git version info 2025-12-04T08:56:21.9179180Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T08:56:21.9179695Z [command]/usr/bin/git version 2025-12-04T08:56:21.9179904Z git version 2.50.1 2025-12-04T08:56:21.9193393Z ##[endgroup] 2025-12-04T08:56:21.9202503Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/f6dcb4ef-2c6d-454a-86d9-cc125074ac45/.gitconfig' 2025-12-04T08:56:21.9221809Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f6dcb4ef-2c6d-454a-86d9-cc125074ac45' before making global git config changes 2025-12-04T08:56:21.9222762Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T08:56:21.9226391Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T08:56:21.9262708Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T08:56:21.9265881Z ##[group]Initializing the repository 2025-12-04T08:56:21.9269427Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T08:56:21.9307336Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-12-04T08:56:21.9308223Z hint: is subject to change. To configure the initial branch name to use in all 2025-12-04T08:56:21.9308827Z hint: of your new repositories, which will suppress this warning, call: 2025-12-04T08:56:21.9309455Z hint: 2025-12-04T08:56:21.9309877Z hint: git config --global init.defaultBranch 2025-12-04T08:56:21.9310325Z hint: 2025-12-04T08:56:21.9310656Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-12-04T08:56:21.9311194Z hint: 'development'. The just-created branch can be renamed via this command: 2025-12-04T08:56:21.9311591Z hint: 2025-12-04T08:56:21.9311815Z hint: git branch -m 2025-12-04T08:56:21.9312056Z hint: 2025-12-04T08:56:21.9312374Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-12-04T08:56:21.9313026Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-12-04T08:56:21.9319180Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-12-04T08:56:21.9346237Z ##[endgroup] 2025-12-04T08:56:21.9346845Z ##[group]Disabling automatic garbage collection 2025-12-04T08:56:21.9349494Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T08:56:21.9374607Z ##[endgroup] 2025-12-04T08:56:21.9375169Z ##[group]Setting up auth 2025-12-04T08:56:21.9380239Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T08:56:21.9406203Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T08:56:21.9733894Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T08:56:21.9760564Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T08:56:22.0070229Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:56:22.0099187Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T08:56:22.0412325Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:56:22.0464275Z ##[endgroup] 2025-12-04T08:56:22.0464893Z ##[group]Fetching the repository 2025-12-04T08:56:22.0472484Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T08:57:05.5724597Z From https://github.com/pytorch/pytorch 2025-12-04T08:57:05.5725167Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T08:57:05.5725963Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T08:57:05.5726555Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T08:57:05.5727208Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T08:57:05.5729046Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T08:57:05.5730527Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T08:57:05.5733318Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T08:57:05.5735923Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T08:57:05.5737206Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T08:57:05.5739281Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T08:57:05.5740765Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T08:57:05.5742474Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T08:57:05.5744048Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T08:57:05.5745865Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T08:57:05.5747238Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T08:57:05.5749330Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T08:57:05.5751404Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T08:57:05.5753608Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T08:57:05.5755275Z * [new branch] adi/test -> origin/adi/test 2025-12-04T08:57:05.5756860Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T08:57:05.5758486Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T08:57:05.5760273Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T08:57:05.5761928Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T08:57:05.5763566Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T08:57:05.5765115Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T08:57:05.5767084Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T08:57:05.5769931Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T08:57:05.5771599Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T08:57:05.5773571Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T08:57:05.5775121Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T08:57:05.5777530Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T08:57:05.5779065Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T08:57:05.5780653Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T08:57:05.5782383Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T08:57:05.5783912Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T08:57:05.5785528Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T08:57:05.5787048Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T08:57:05.5789113Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T08:57:05.5791075Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T08:57:05.5792759Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T08:57:05.5794436Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T08:57:05.5796192Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T08:57:05.5798051Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T08:57:05.5799694Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T08:57:05.5801560Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T08:57:05.5803143Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T08:57:05.5804827Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T08:57:05.5806532Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T08:57:05.5808163Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T08:57:05.5809791Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T08:57:05.5811425Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T08:57:05.5813076Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T08:57:05.5814748Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T08:57:05.5816390Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T08:57:05.5818256Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T08:57:05.5821002Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T08:57:05.5822598Z * [new branch] async_tp -> origin/async_tp 2025-12-04T08:57:05.5824365Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T08:57:05.5826074Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T08:57:05.5827783Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T08:57:05.5829577Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T08:57:05.5831273Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T08:57:05.5833030Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T08:57:05.5834741Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T08:57:05.5836431Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T08:57:05.5838159Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T08:57:05.5839847Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T08:57:05.5841689Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T08:57:05.5843399Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T08:57:05.5845173Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T08:57:05.5847346Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T08:57:05.5848903Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T08:57:05.5850582Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T08:57:05.5852192Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T08:57:05.5854813Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T08:57:05.5856822Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T08:57:05.5858398Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T08:57:05.5860197Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T08:57:05.5861761Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T08:57:05.5863972Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T08:57:05.5866205Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T08:57:05.5868397Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T08:57:05.5869988Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T08:57:05.5871529Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T08:57:05.5873146Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T08:57:05.5874963Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T08:57:05.5876675Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T08:57:05.5878260Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T08:57:05.5880426Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T08:57:05.5882446Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T08:57:05.5883713Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T08:57:05.5885563Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T08:57:05.5887239Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T08:57:05.5888870Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T08:57:05.5890524Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T08:57:05.5892255Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T08:57:05.5893988Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T08:57:05.5895660Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T08:57:05.5897373Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T08:57:05.5899005Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T08:57:05.5900653Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T08:57:05.5902382Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T08:57:05.5904028Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T08:57:05.5905641Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T08:57:05.5907255Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T08:57:05.5908831Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T08:57:05.5910471Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T08:57:05.5912036Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T08:57:05.5913646Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T08:57:05.5915182Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T08:57:05.5916845Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T08:57:05.5919006Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T08:57:05.5920389Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T08:57:05.5922184Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T08:57:05.5923761Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T08:57:05.5925395Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T08:57:05.5926961Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T08:57:05.5928535Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T08:57:05.5930979Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T08:57:05.5932584Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T08:57:05.5934219Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T08:57:05.5935811Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T08:57:05.5937598Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T08:57:05.5939220Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T08:57:05.5940852Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T08:57:05.5943152Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T08:57:05.5944894Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T08:57:05.5946683Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5948367Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5950086Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5951714Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5953383Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5955183Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5956879Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5958550Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5960262Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5962043Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5963729Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5965375Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5967097Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5968786Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5970592Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5972004Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5973832Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5975472Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5977255Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T08:57:05.5978904Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T08:57:05.5980586Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T08:57:05.5982289Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T08:57:05.5983960Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T08:57:05.5985623Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T08:57:05.5987304Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T08:57:05.5988962Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T08:57:05.5990624Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T08:57:05.5993071Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T08:57:05.5994513Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T08:57:05.5996636Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T08:57:05.5998601Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T08:57:05.6000104Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T08:57:05.6001832Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T08:57:05.6003464Z * [new branch] context_test -> origin/context_test 2025-12-04T08:57:05.6005828Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T08:57:05.6007922Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T08:57:05.6009611Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T08:57:05.6011856Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T08:57:05.6013365Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T08:57:05.6014918Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T08:57:05.6016486Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T08:57:05.6018329Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T08:57:05.6019865Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T08:57:05.6021686Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T08:57:05.6023680Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T08:57:05.6025611Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T08:57:05.6027268Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T08:57:05.6029021Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T08:57:05.6030650Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T08:57:05.6032280Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T08:57:05.6033916Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T08:57:05.6035672Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T08:57:05.6037336Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T08:57:05.6039041Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T08:57:05.6040829Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T08:57:05.6042448Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T08:57:05.6044042Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T08:57:05.6045656Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T08:57:05.6047315Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T08:57:05.6048887Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T08:57:05.6050563Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T08:57:05.6052299Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T08:57:05.6053901Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T08:57:05.6055864Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T08:57:05.6057903Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T08:57:05.6059274Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T08:57:05.6061058Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T08:57:05.6062669Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T08:57:05.6064359Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T08:57:05.6066548Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T08:57:05.6068834Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T08:57:05.6070374Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T08:57:05.6072096Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T08:57:05.6076974Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T08:57:05.6078684Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T08:57:05.6080830Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T08:57:05.6082535Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T08:57:05.6085038Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T08:57:05.6087355Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T08:57:05.6089083Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T08:57:05.6091402Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T08:57:05.6092589Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T08:57:05.6094447Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T08:57:05.6096204Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T08:57:05.6098016Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T08:57:05.6099965Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T08:57:05.6102080Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T08:57:05.6104327Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T08:57:05.6106154Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T08:57:05.6107999Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T08:57:05.6109811Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T08:57:05.6111405Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T08:57:05.6113111Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T08:57:05.6114716Z * [new branch] docs -> origin/docs 2025-12-04T08:57:05.6116514Z * [new branch] documentation -> origin/documentation 2025-12-04T08:57:05.6118381Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T08:57:05.6120706Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T08:57:05.6122259Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T08:57:05.6123610Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T08:57:05.6125350Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T08:57:05.6127081Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T08:57:05.6128779Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T08:57:05.6130419Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T08:57:05.6132094Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T08:57:05.6133697Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T08:57:05.6135887Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T08:57:05.6137544Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T08:57:05.6138867Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T08:57:05.6140613Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T08:57:05.6142287Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T08:57:05.6144169Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T08:57:05.6146215Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T08:57:05.6147775Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T08:57:05.6149552Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T08:57:05.6151366Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T08:57:05.6152737Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T08:57:05.6154601Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T08:57:05.6156092Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T08:57:05.6157795Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T08:57:05.6159606Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T08:57:05.6161372Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T08:57:05.6163031Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T08:57:05.6164716Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T08:57:05.6166424Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T08:57:05.6168069Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T08:57:05.6169619Z * [new branch] exec -> origin/exec 2025-12-04T08:57:05.6171481Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T08:57:05.6173210Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T08:57:05.6174870Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T08:57:05.6176646Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T08:57:05.6178228Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T08:57:05.6179840Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T08:57:05.6181547Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T08:57:05.6183262Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T08:57:05.6184908Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T08:57:05.6186495Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T08:57:05.6188143Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T08:57:05.6189900Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T08:57:05.6191517Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T08:57:05.6193162Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T08:57:05.6194872Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T08:57:05.6196550Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T08:57:05.6198135Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T08:57:05.6199835Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T08:57:05.6201567Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T08:57:05.6203155Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T08:57:05.6205172Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T08:57:05.6206818Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T08:57:05.6208650Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T08:57:05.6210227Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T08:57:05.6211943Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T08:57:05.6214034Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T08:57:05.6215808Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T08:57:05.6217769Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T08:57:05.6219453Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T08:57:05.6221150Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T08:57:05.6222815Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T08:57:05.6224662Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T08:57:05.6226268Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T08:57:05.6227860Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T08:57:05.6229431Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T08:57:05.6231594Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T08:57:05.6233244Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T08:57:05.6235436Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T08:57:05.6237129Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T08:57:05.6239310Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T08:57:05.6241231Z * [new branch] fca -> origin/fca 2025-12-04T08:57:05.6242873Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T08:57:05.6244454Z * [new branch] fca5 -> origin/fca5 2025-12-04T08:57:05.6246628Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T08:57:05.6248221Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T08:57:05.6250212Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T08:57:05.6251791Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T08:57:05.6253951Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T08:57:05.6255558Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T08:57:05.6257477Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T08:57:05.6259061Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T08:57:05.6260572Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T08:57:05.6262127Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T08:57:05.6263699Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T08:57:05.6265246Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T08:57:05.6266955Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T08:57:05.6268582Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T08:57:05.6270214Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T08:57:05.6272132Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T08:57:05.6273654Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T08:57:05.6275239Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T08:57:05.6276809Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T08:57:05.6278459Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T08:57:05.6280067Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T08:57:05.6281832Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T08:57:05.6283541Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T08:57:05.6285169Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T08:57:05.6286811Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T08:57:05.6288392Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T08:57:05.6290130Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T08:57:05.6291711Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T08:57:05.6294105Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T08:57:05.6295723Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T08:57:05.6297308Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T08:57:05.6298984Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T08:57:05.6300631Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T08:57:05.6302861Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T08:57:05.6304563Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T08:57:05.6306949Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T08:57:05.6309052Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T08:57:05.6312200Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T08:57:05.6313803Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T08:57:05.6316445Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T08:57:05.6318489Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T08:57:05.6321629Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T08:57:05.6323233Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T08:57:05.6325869Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T08:57:05.6327531Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T08:57:05.6329203Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T08:57:05.6331316Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T08:57:05.6332886Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T08:57:05.6334503Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T08:57:05.6336681Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T08:57:05.6338498Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T08:57:05.6339796Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T08:57:05.6341974Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T08:57:05.6343514Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T08:57:05.6345092Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T08:57:05.6347209Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T08:57:05.6349038Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T08:57:05.6350646Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T08:57:05.6352788Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T08:57:05.6354405Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T08:57:05.6356004Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T08:57:05.6358635Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T08:57:05.6360231Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T08:57:05.6361838Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T08:57:05.6364100Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T08:57:05.6365742Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T08:57:05.6367319Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T08:57:05.6369536Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T08:57:05.6371117Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T08:57:05.6372715Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T08:57:05.6374980Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T08:57:05.6376732Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T08:57:05.6378383Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T08:57:05.6380588Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T08:57:05.6382159Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T08:57:05.6383727Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T08:57:05.6385967Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T08:57:05.6387566Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T08:57:05.6389215Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T08:57:05.6391386Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T08:57:05.6392900Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T08:57:05.6394456Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T08:57:05.6396595Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T08:57:05.6398193Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T08:57:05.6399955Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T08:57:05.6402138Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T08:57:05.6403689Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T08:57:05.6405246Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T08:57:05.6407272Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T08:57:05.6408849Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T08:57:05.6410491Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T08:57:05.6412765Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T08:57:05.6414343Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T08:57:05.6415924Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T08:57:05.6418325Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T08:57:05.6419979Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T08:57:05.6421599Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T08:57:05.6423853Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T08:57:05.6425419Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T08:57:05.6426996Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T08:57:05.6429240Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T08:57:05.6430885Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T08:57:05.6432496Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T08:57:05.6434648Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T08:57:05.6436405Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T08:57:05.6438008Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T08:57:05.6440483Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T08:57:05.6442116Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T08:57:05.6443680Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T08:57:05.6446146Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T08:57:05.6447765Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T08:57:05.6449412Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T08:57:05.6451714Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T08:57:05.6453350Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T08:57:05.6454927Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T08:57:05.6457198Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T08:57:05.6458761Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T08:57:05.6460364Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T08:57:05.6462878Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T08:57:05.6464282Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T08:57:05.6465968Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T08:57:05.6468295Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T08:57:05.6469907Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T08:57:05.6471588Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T08:57:05.6473938Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T08:57:05.6475496Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T08:57:05.6477075Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T08:57:05.6479460Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T08:57:05.6481235Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T08:57:05.6482847Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T08:57:05.6485258Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T08:57:05.6486853Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T08:57:05.6488502Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T08:57:05.6491134Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T08:57:05.6492833Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T08:57:05.6494916Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T08:57:05.6496460Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T08:57:05.6498812Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T08:57:05.6500445Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T08:57:05.6502553Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T08:57:05.6504188Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T08:57:05.6505788Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T08:57:05.6508363Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T08:57:05.6509997Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T08:57:05.6511616Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T08:57:05.6513826Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T08:57:05.6528977Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T08:57:05.6529399Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T08:57:05.6529908Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T08:57:05.6530350Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T08:57:05.6530706Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T08:57:05.6531054Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T08:57:05.6531587Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T08:57:05.6532130Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T08:57:05.6532614Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T08:57:05.6533101Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T08:57:05.6533552Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T08:57:05.6535455Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T08:57:05.6536992Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T08:57:05.6538563Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T08:57:05.6540674Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T08:57:05.6542218Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T08:57:05.6543856Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T08:57:05.6546047Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T08:57:05.6547496Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T08:57:05.6549063Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T08:57:05.6551263Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T08:57:05.6553468Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T08:57:05.6554945Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T08:57:05.6556547Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T08:57:05.6558724Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T08:57:05.6560310Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T08:57:05.6561859Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T08:57:05.6564001Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T08:57:05.6565441Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T08:57:05.6567070Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T08:57:05.6569199Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T08:57:05.6570764Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T08:57:05.6572368Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T08:57:05.6574881Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T08:57:05.6576581Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T08:57:05.6578244Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T08:57:05.6580416Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T08:57:05.6582063Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T08:57:05.6583674Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T08:57:05.6586014Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T08:57:05.6587543Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T08:57:05.6589277Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T08:57:05.6591750Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T08:57:05.6593873Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T08:57:05.6595187Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T08:57:05.6597420Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T08:57:05.6599045Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T08:57:05.6600738Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T08:57:05.6603009Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T08:57:05.6604571Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T08:57:05.6606697Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T08:57:05.6608346Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T08:57:05.6609914Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T08:57:05.6611934Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T08:57:05.6613581Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T08:57:05.6615772Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T08:57:05.6617452Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T08:57:05.6619796Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T08:57:05.6622006Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T08:57:05.6623638Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T08:57:05.6625252Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T08:57:05.6627438Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T08:57:05.6629127Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T08:57:05.6630938Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T08:57:05.6632973Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T08:57:05.6634562Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T08:57:05.6636156Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T08:57:05.6638707Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T08:57:05.6640404Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T08:57:05.6643184Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T08:57:05.6644762Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T08:57:05.6647029Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T08:57:05.6648666Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T08:57:05.6650345Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T08:57:05.6652375Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T08:57:05.6654015Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T08:57:05.6655708Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T08:57:05.6657866Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T08:57:05.6659268Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T08:57:05.6660947Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T08:57:05.6663124Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T08:57:05.6664730Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T08:57:05.6666357Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T08:57:05.6668540Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T08:57:05.6670105Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T08:57:05.6671682Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T08:57:05.6673929Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T08:57:05.6675727Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T08:57:05.6677380Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T08:57:05.6679336Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T08:57:05.6681085Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T08:57:05.6682752Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T08:57:05.6684745Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T08:57:05.6686254Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T08:57:05.6688307Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T08:57:05.6690023Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T08:57:05.6691547Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T08:57:05.6693868Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T08:57:05.6695553Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T08:57:05.6697096Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T08:57:05.6699078Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T08:57:05.6700659Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T08:57:05.6702695Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T08:57:05.6704223Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T08:57:05.6706188Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T08:57:05.6707751Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T08:57:05.6710845Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T08:57:05.6712942Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T08:57:05.6714936Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T08:57:05.6717252Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T08:57:05.6720095Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T08:57:05.6721879Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T08:57:05.6723894Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T08:57:05.6725504Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T08:57:05.6727528Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T08:57:05.6729156Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T08:57:05.6731244Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T08:57:05.6732749Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T08:57:05.6734483Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T08:57:05.6737194Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T08:57:05.6738612Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T08:57:05.6740270Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T08:57:05.6742382Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T08:57:05.6744021Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T08:57:05.6745653Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T08:57:05.6747870Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T08:57:05.6749421Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T08:57:05.6750949Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T08:57:05.6753165Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T08:57:05.6754714Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T08:57:05.6756323Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T08:57:05.6758316Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T08:57:05.6759961Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T08:57:05.6761607Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T08:57:05.6763810Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T08:57:05.6765423Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T08:57:05.6767014Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T08:57:05.6769067Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T08:57:05.6770682Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T08:57:05.6772280Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T08:57:05.6774434Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T08:57:05.6775959Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T08:57:05.6777565Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T08:57:05.6779731Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T08:57:05.6781448Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T08:57:05.6783140Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T08:57:05.6785854Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T08:57:05.6787405Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T08:57:05.6788953Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T08:57:05.6791092Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T08:57:05.6792714Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T08:57:05.6794385Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T08:57:05.6796429Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T08:57:05.6798000Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T08:57:05.6799704Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T08:57:05.6801939Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T08:57:05.6803511Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T08:57:05.6805081Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T08:57:05.6807254Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T08:57:05.6808884Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T08:57:05.6810505Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T08:57:05.6812607Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T08:57:05.6814181Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T08:57:05.6815790Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T08:57:05.6818113Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T08:57:05.6819696Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T08:57:05.6821329Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T08:57:05.6823421Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T08:57:05.6824990Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T08:57:05.6826570Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T08:57:05.6828742Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T08:57:05.6830318Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T08:57:05.6831959Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T08:57:05.6834088Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T08:57:05.6835632Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T08:57:05.6837263Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T08:57:05.6839337Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T08:57:05.6841055Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T08:57:05.6842635Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T08:57:05.6844773Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T08:57:05.6846455Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T08:57:05.6848193Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T08:57:05.6850308Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T08:57:05.6851872Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T08:57:05.6853969Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T08:57:05.6855549Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T08:57:05.6857164Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T08:57:05.6859426Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T08:57:05.6860974Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T08:57:05.6862750Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T08:57:05.6864874Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T08:57:05.6866436Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T08:57:05.6867984Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T08:57:05.6870114Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T08:57:05.6871754Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T08:57:05.6873319Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T08:57:05.6875918Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T08:57:05.6877493Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T08:57:05.6879575Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T08:57:05.6881975Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T08:57:05.6883559Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T08:57:05.6885198Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T08:57:05.6887382Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T08:57:05.6888960Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T08:57:05.6890555Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T08:57:05.6892666Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T08:57:05.6894330Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T08:57:05.6895958Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T08:57:05.6898168Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T08:57:05.6899772Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T08:57:05.6901344Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T08:57:05.6904001Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T08:57:05.6905600Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T08:57:05.6907313Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T08:57:05.6909433Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T08:57:05.6911145Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T08:57:05.6913210Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T08:57:05.6914698Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T08:57:05.6916898Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T08:57:05.6918817Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T08:57:05.6921143Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T08:57:05.6922662Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T08:57:05.6924816Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T08:57:05.6926381Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T08:57:05.6928501Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T08:57:05.6930048Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T08:57:05.6932097Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T08:57:05.6933632Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T08:57:05.6935212Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T08:57:05.6937800Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T08:57:05.6939474Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T08:57:05.6941496Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T08:57:05.6943035Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T08:57:05.6945185Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T08:57:05.6946836Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T08:57:05.6948378Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T08:57:05.6950912Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T08:57:05.6952553Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T08:57:05.6954159Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T08:57:05.6956531Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T08:57:05.6958988Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T08:57:05.6960675Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T08:57:05.6962318Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T08:57:05.6964446Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T08:57:05.6966080Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T08:57:05.6967667Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T08:57:05.6969852Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T08:57:05.6971385Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T08:57:05.6972967Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T08:57:05.6975767Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T08:57:05.6977254Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T08:57:05.6978835Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T08:57:05.6981506Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T08:57:05.6983268Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T08:57:05.6984868Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T08:57:05.6987216Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T08:57:05.6988886Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T08:57:05.6990577Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T08:57:05.6993158Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T08:57:05.6994914Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T08:57:05.6997124Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T08:57:05.6998821Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T08:57:05.7001126Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T08:57:05.7002758Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T08:57:05.7004351Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T08:57:05.7006669Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T08:57:05.7008197Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T08:57:05.7009832Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T08:57:05.7012119Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T08:57:05.7013776Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T08:57:05.7015376Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T08:57:05.7018134Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T08:57:05.7019739Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T08:57:05.7021911Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T08:57:05.7023579Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T08:57:05.7025151Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T08:57:05.7027280Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T08:57:05.7028888Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T08:57:05.7030426Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T08:57:05.7032672Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T08:57:05.7034172Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T08:57:05.7035804Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T08:57:05.7038038Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T08:57:05.7039605Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T08:57:05.7041728Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T08:57:05.7043655Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T08:57:05.7045222Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T08:57:05.7046793Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T08:57:05.7048923Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T08:57:05.7050500Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T08:57:05.7052107Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T08:57:05.7054500Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T08:57:05.7056353Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T08:57:05.7058078Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T08:57:05.7060226Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T08:57:05.7061841Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T08:57:05.7063460Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T08:57:05.7065791Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T08:57:05.7067572Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T08:57:05.7069213Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T08:57:05.7071477Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T08:57:05.7073159Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T08:57:05.7074739Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T08:57:05.7076846Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T08:57:05.7078399Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T08:57:05.7080086Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T08:57:05.7082340Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T08:57:05.7083850Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T08:57:05.7085603Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T08:57:05.7087638Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T08:57:05.7089208Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T08:57:05.7090742Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T08:57:05.7092855Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T08:57:05.7094494Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T08:57:05.7096091Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T08:57:05.7098308Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T08:57:05.7099977Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T08:57:05.7101663Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T08:57:05.7104295Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T08:57:05.7106037Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T08:57:05.7107646Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T08:57:05.7109821Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T08:57:05.7111445Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T08:57:05.7113041Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T08:57:05.7115257Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T08:57:05.7116824Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T08:57:05.7120858Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T08:57:05.7123057Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T08:57:05.7124701Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T08:57:05.7126361Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T08:57:05.7129128Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T08:57:05.7130870Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T08:57:05.7132368Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T08:57:05.7134591Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T08:57:05.7136207Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T08:57:05.7137804Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T08:57:05.7140017Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T08:57:05.7141661Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T08:57:05.7143229Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T08:57:05.7145470Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T08:57:05.7147072Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T08:57:05.7148675Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T08:57:05.7150854Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T08:57:05.7152431Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T08:57:05.7154064Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T08:57:05.7156369Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T08:57:05.7157851Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T08:57:05.7159392Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T08:57:05.7161697Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T08:57:05.7163331Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T08:57:05.7164989Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T08:57:05.7167175Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T08:57:05.7168763Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T08:57:05.7170354Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T08:57:05.7172663Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T08:57:05.7174191Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T08:57:05.7175714Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T08:57:05.7177955Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T08:57:05.7179600Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T08:57:05.7181213Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T08:57:05.7183360Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T08:57:05.7185017Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T08:57:05.7186670Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T08:57:05.7188834Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T08:57:05.7190572Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T08:57:05.7192252Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T08:57:05.7194382Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T08:57:05.7196003Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T08:57:05.7197577Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T08:57:05.7199716Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T08:57:05.7201432Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T08:57:05.7203046Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T08:57:05.7205232Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T08:57:05.7206848Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T08:57:05.7208502Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T08:57:05.7210692Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T08:57:05.7212334Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T08:57:05.7213918Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T08:57:05.7216625Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T08:57:05.7218440Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T08:57:05.7219999Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T08:57:05.7222660Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T08:57:05.7224329Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T08:57:05.7225902Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T08:57:05.7228098Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T08:57:05.7229675Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T08:57:05.7231309Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T08:57:05.7233495Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T08:57:05.7235303Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T08:57:05.7236968Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T08:57:05.7239396Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T08:57:05.7241092Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T08:57:05.7242666Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T08:57:05.7244948Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T08:57:05.7246614Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T08:57:05.7248331Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T08:57:05.7250561Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T08:57:05.7252147Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T08:57:05.7253705Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T08:57:05.7256079Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T08:57:05.7257630Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T08:57:05.7259219Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T08:57:05.7261502Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T08:57:05.7263097Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T08:57:05.7264696Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T08:57:05.7266926Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T08:57:05.7268516Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T08:57:05.7270148Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T08:57:05.7272376Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T08:57:05.7274041Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T08:57:05.7275592Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T08:57:05.7277928Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T08:57:05.7279556Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T08:57:05.7281293Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T08:57:05.7283560Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T08:57:05.7285147Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T08:57:05.7286761Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T08:57:05.7289059Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T08:57:05.7290656Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T08:57:05.7292250Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T08:57:05.7294517Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T08:57:05.7296177Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T08:57:05.7297853Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T08:57:05.7300063Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T08:57:05.7301729Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T08:57:05.7303368Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T08:57:05.7305569Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T08:57:05.7307237Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T08:57:05.7308816Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T08:57:05.7311081Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T08:57:05.7312722Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T08:57:05.7314319Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T08:57:05.7316561Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T08:57:05.7318349Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T08:57:05.7319998Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T08:57:05.7322616Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T08:57:05.7324313Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T08:57:05.7325953Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T08:57:05.7328275Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T08:57:05.7329953Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T08:57:05.7331532Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T08:57:05.7333727Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T08:57:05.7335357Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T08:57:05.7337011Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T08:57:05.7339901Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T08:57:05.7342201Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T08:57:05.7344607Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T08:57:05.7348056Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T08:57:05.7350522Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T08:57:05.7352780Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T08:57:05.7355784Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T08:57:05.7358154Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T08:57:05.7360195Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T08:57:05.7363358Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T08:57:05.7365658Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T08:57:05.7367962Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T08:57:05.7371298Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T08:57:05.7373418Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T08:57:05.7375720Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T08:57:05.7379485Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T08:57:05.7380945Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T08:57:05.7382523Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T08:57:05.7385323Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T08:57:05.7386970Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T08:57:05.7388965Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T08:57:05.7390560Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T08:57:05.7392507Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T08:57:05.7394078Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T08:57:05.7396162Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T08:57:05.7397720Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T08:57:05.7399685Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T08:57:05.7401352Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T08:57:05.7403639Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T08:57:05.7405233Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T08:57:05.7407417Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T08:57:05.7409106Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T08:57:05.7411126Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T08:57:05.7412711Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T08:57:05.7414308Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T08:57:05.7416401Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T08:57:05.7418190Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T08:57:05.7419803Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T08:57:05.7422103Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T08:57:05.7423687Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T08:57:05.7425273Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T08:57:05.7427632Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T08:57:05.7429219Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T08:57:05.7430818Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T08:57:05.7432925Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T08:57:05.7434577Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T08:57:05.7436209Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T08:57:05.7438563Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T08:57:05.7440182Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T08:57:05.7441790Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T08:57:05.7443957Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T08:57:05.7445604Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T08:57:05.7447181Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T08:57:05.7449473Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T08:57:05.7451106Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T08:57:05.7452697Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T08:57:05.7455439Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T08:57:05.7457028Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T08:57:05.7459415Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T08:57:05.7461202Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T08:57:05.7462865Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T08:57:05.7465075Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T08:57:05.7466600Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T08:57:05.7468199Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T08:57:05.7470372Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T08:57:05.7471946Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T08:57:05.7474328Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T08:57:05.7475869Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T08:57:05.7477633Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T08:57:05.7479826Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T08:57:05.7481750Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T08:57:05.7483342Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T08:57:05.7485589Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T08:57:05.7487359Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T08:57:05.7488950Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T08:57:05.7491117Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T08:57:05.7492749Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T08:57:05.7494428Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T08:57:05.7496588Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T08:57:05.7498176Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T08:57:05.7499771Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T08:57:05.7501933Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T08:57:05.7503682Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T08:57:05.7505290Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T08:57:05.7507294Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T08:57:05.7508894Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T08:57:05.7510528Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T08:57:05.7512676Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T08:57:05.7514235Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T08:57:05.7515856Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T08:57:05.7519409Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T08:57:05.7521188Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T08:57:05.7522811Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T08:57:05.7524765Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T08:57:05.7526426Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T08:57:05.7528211Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T08:57:05.7530010Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T08:57:05.7531591Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T08:57:05.7533221Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T08:57:05.7535352Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T08:57:05.7536856Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T08:57:05.7538320Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T08:57:05.7540780Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T08:57:05.7542404Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T08:57:05.7544010Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T08:57:05.7546176Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T08:57:05.7547878Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T08:57:05.7549455Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T08:57:05.7551532Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T08:57:05.7553114Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T08:57:05.7554685Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T08:57:05.7557369Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T08:57:05.7558963Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T08:57:05.7561080Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T08:57:05.7562656Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T08:57:05.7564181Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T08:57:05.7567080Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T08:57:05.7568821Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T08:57:05.7570345Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T08:57:05.7572480Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T08:57:05.7574035Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T08:57:05.7575604Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T08:57:05.7577896Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T08:57:05.7579429Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T08:57:05.7581010Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T08:57:05.7583266Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T08:57:05.7584964Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T08:57:05.7586563Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T08:57:05.7588857Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T08:57:05.7590469Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T08:57:05.7592087Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T08:57:05.7594398Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T08:57:05.7596092Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T08:57:05.7597720Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T08:57:05.7600410Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T08:57:05.7602199Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T08:57:05.7603790Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T08:57:05.7605942Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T08:57:05.7607985Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T08:57:05.7609387Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T08:57:05.7611745Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T08:57:05.7613416Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T08:57:05.7615046Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T08:57:05.7617410Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T08:57:05.7619127Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T08:57:05.7620691Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T08:57:05.7622831Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T08:57:05.7624529Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T08:57:05.7626447Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T08:57:05.7628474Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T08:57:05.7630210Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T08:57:05.7631819Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T08:57:05.7634780Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T08:57:05.7636181Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T08:57:05.7637741Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T08:57:05.7639834Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T08:57:05.7641736Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T08:57:05.7643303Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T08:57:05.7645463Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T08:57:05.7647051Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T08:57:05.7648717Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T08:57:05.7650786Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T08:57:05.7652352Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T08:57:05.7653989Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T08:57:05.7656108Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T08:57:05.7657719Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T08:57:05.7659391Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T08:57:05.7661495Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T08:57:05.7663056Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T08:57:05.7664649Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T08:57:05.7666759Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T08:57:05.7668376Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T08:57:05.7669986Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T08:57:05.7672521Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T08:57:05.7674152Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T08:57:05.7675789Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T08:57:05.7677814Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T08:57:05.7679425Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T08:57:05.7681121Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T08:57:05.7683781Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T08:57:05.7685389Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T08:57:05.7686956Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T08:57:05.7689113Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T08:57:05.7690658Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T08:57:05.7692264Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T08:57:05.7694490Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T08:57:05.7696222Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T08:57:05.7697828Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T08:57:05.7699907Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T08:57:05.7701478Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T08:57:05.7703096Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T08:57:05.7705191Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T08:57:05.7706720Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T08:57:05.7708296Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T08:57:05.7710437Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T08:57:05.7712255Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T08:57:05.7713838Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T08:57:05.7716084Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T08:57:05.7717709Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T08:57:05.7719517Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T08:57:05.7721787Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T08:57:05.7723379Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T08:57:05.7724946Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T08:57:05.7726952Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T08:57:05.7728551Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T08:57:05.7730017Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T08:57:05.7732107Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T08:57:05.7733716Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T08:57:05.7735346Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T08:57:05.7737805Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T08:57:05.7739270Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T08:57:05.7741019Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T08:57:05.7743105Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T08:57:05.7744929Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T08:57:05.7746756Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T08:57:05.7749160Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T08:57:05.7750730Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T08:57:05.7752344Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T08:57:05.7754821Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T08:57:05.7756976Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T08:57:05.7758641Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T08:57:05.7761266Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T08:57:05.7762862Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T08:57:05.7764471Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T08:57:05.7766587Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T08:57:05.7768302Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T08:57:05.7769866Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T08:57:05.7771944Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T08:57:05.7773551Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T08:57:05.7775129Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T08:57:05.7777819Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T08:57:05.7779687Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T08:57:05.7781598Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T08:57:05.7784337Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T08:57:05.7785936Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T08:57:05.7787581Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T08:57:05.7789719Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T08:57:05.7791486Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T08:57:05.7793132Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T08:57:05.7795355Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T08:57:05.7796985Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T08:57:05.7798601Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T08:57:05.7800710Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T08:57:05.7802255Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T08:57:05.7804320Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T08:57:05.7806978Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T08:57:05.7808696Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T08:57:05.7810929Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T08:57:05.7812420Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T08:57:05.7813930Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T08:57:05.7815969Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T08:57:05.7817514Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T08:57:05.7819342Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T08:57:05.7821739Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T08:57:05.7823313Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T08:57:05.7824899Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T08:57:05.7827170Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T08:57:05.7828452Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T08:57:05.7830227Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T08:57:05.7832308Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T08:57:05.7833887Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T08:57:05.7835473Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T08:57:05.7838168Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T08:57:05.7839805Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T08:57:05.7841679Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T08:57:05.7844311Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T08:57:05.7845947Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T08:57:05.7848352Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T08:57:05.7849961Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T08:57:05.7851592Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T08:57:05.7853806Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T08:57:05.7855505Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T08:57:05.7857153Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T08:57:05.7859302Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T08:57:05.7860967Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T08:57:05.7862618Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T08:57:05.7864603Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T08:57:05.7866149Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T08:57:05.7867862Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T08:57:05.7869900Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T08:57:05.7871590Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T08:57:05.7873429Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T08:57:05.7875255Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T08:57:05.7876898Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T08:57:05.7878465Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T08:57:05.7880869Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T08:57:05.7882552Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T08:57:05.7884181Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T08:57:05.7886439Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T08:57:05.7888047Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T08:57:05.7889798Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T08:57:05.7892122Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T08:57:05.7893639Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T08:57:05.7895220Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T08:57:05.7897483Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T08:57:05.7899089Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T08:57:05.7900636Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T08:57:05.7902987Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T08:57:05.7904519Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T08:57:05.7906122Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T08:57:05.7908359Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T08:57:05.7910034Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T08:57:05.7911678Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T08:57:05.7913780Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T08:57:05.7915498Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T08:57:05.7917272Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T08:57:05.7921909Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T08:57:05.7923468Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T08:57:05.7925074Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T08:57:05.7927659Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T08:57:05.7929265Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T08:57:05.7931321Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T08:57:05.7932895Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T08:57:05.7934931Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T08:57:05.7936475Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T08:57:05.7938513Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T08:57:05.7940023Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T08:57:05.7942584Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T08:57:05.7944206Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T08:57:05.7946274Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T08:57:05.7947913Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T08:57:05.7949458Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T08:57:05.7951639Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T08:57:05.7953229Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T08:57:05.7954799Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T08:57:05.7957156Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T08:57:05.7958763Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T08:57:05.7960568Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T08:57:05.7962595Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T08:57:05.7964109Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T08:57:05.7966822Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T08:57:05.7968412Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T08:57:05.7969874Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T08:57:05.7972044Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T08:57:05.7973733Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T08:57:05.7975341Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T08:57:05.7977973Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T08:57:05.7979601Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T08:57:05.7981487Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T08:57:05.7983620Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T08:57:05.7985213Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T08:57:05.7986862Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T08:57:05.7988945Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T08:57:05.7990422Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T08:57:05.7992035Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T08:57:05.7994177Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T08:57:05.7995752Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T08:57:05.7997362Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T08:57:05.7999556Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T08:57:05.8001284Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T08:57:05.8002860Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T08:57:05.8005168Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T08:57:05.8006769Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T08:57:05.8008491Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T08:57:05.8010452Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T08:57:05.8012077Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T08:57:05.8013716Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T08:57:05.8015903Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T08:57:05.8017731Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T08:57:05.8019442Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T08:57:05.8022015Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T08:57:05.8023768Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T08:57:05.8025355Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T08:57:05.8027567Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T08:57:05.8029242Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T08:57:05.8030851Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T08:57:05.8033001Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T08:57:05.8034780Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T08:57:05.8036399Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T08:57:05.8038369Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T08:57:05.8039976Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T08:57:05.8041586Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T08:57:05.8043648Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T08:57:05.8045347Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T08:57:05.8046851Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T08:57:05.8049450Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T08:57:05.8051086Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T08:57:05.8053641Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T08:57:05.8055238Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T08:57:05.8056842Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T08:57:05.8058977Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T08:57:05.8060632Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T08:57:05.8062688Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T08:57:05.8064195Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T08:57:05.8066232Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T08:57:05.8067745Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T08:57:05.8069872Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T08:57:05.8071527Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T08:57:05.8073593Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T08:57:05.8075158Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T08:57:05.8076818Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T08:57:05.8078897Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T08:57:05.8080561Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T08:57:05.8082321Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T08:57:05.8084341Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T08:57:05.8086035Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T08:57:05.8087549Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T08:57:05.8089627Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T08:57:05.8091233Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T08:57:05.8092827Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T08:57:05.8094998Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T08:57:05.8096624Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T08:57:05.8098228Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T08:57:05.8100273Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T08:57:05.8101868Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T08:57:05.8103964Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T08:57:05.8106066Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T08:57:05.8107661Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T08:57:05.8109253Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T08:57:05.8111435Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T08:57:05.8113026Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T08:57:05.8114578Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T08:57:05.8116724Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T08:57:05.8118536Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T08:57:05.8132758Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T08:57:05.8133061Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T08:57:05.8133258Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T08:57:05.8133416Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T08:57:05.8133570Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T08:57:05.8133738Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T08:57:05.8133887Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T08:57:05.8134047Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T08:57:05.8135236Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T08:57:05.8136787Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T08:57:05.8139022Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T08:57:05.8140812Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T08:57:05.8142203Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T08:57:05.8144352Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T08:57:05.8145950Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T08:57:05.8147612Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T08:57:05.8150088Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T08:57:05.8151433Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T08:57:05.8153050Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T08:57:05.8155632Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T08:57:05.8157275Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T08:57:05.8159824Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T08:57:05.8161623Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T08:57:05.8163838Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T08:57:05.8165621Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T08:57:05.8167339Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T08:57:05.8169622Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T08:57:05.8171271Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T08:57:05.8172857Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T08:57:05.8174938Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T08:57:05.8176570Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T08:57:05.8178157Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T08:57:05.8180356Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T08:57:05.8181951Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T08:57:05.8183469Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T08:57:05.8185725Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T08:57:05.8187434Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T08:57:05.8189158Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T08:57:05.8191472Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T08:57:05.8193074Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T08:57:05.8194660Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T08:57:05.8196607Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T08:57:05.8198160Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T08:57:05.8200125Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T08:57:05.8201696Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T08:57:05.8204388Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T08:57:05.8205996Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T08:57:05.8207607Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T08:57:05.8210019Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T08:57:05.8211609Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T08:57:05.8213426Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T08:57:05.8215482Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T08:57:05.8217253Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T08:57:05.8219044Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T08:57:05.8221116Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T08:57:05.8222731Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T08:57:05.8224266Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T08:57:05.8226358Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T08:57:05.8227925Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T08:57:05.8229473Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T08:57:05.8231618Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T08:57:05.8233251Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T08:57:05.8234879Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T08:57:05.8237108Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T08:57:05.8238880Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T08:57:05.8240560Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T08:57:05.8242690Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T08:57:05.8244330Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T08:57:05.8245883Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T08:57:05.8248119Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T08:57:05.8249548Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T08:57:05.8251217Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T08:57:05.8253555Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T08:57:05.8255372Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T08:57:05.8256971Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T08:57:05.8259099Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T08:57:05.8260578Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T08:57:05.8262133Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T08:57:05.8264268Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T08:57:05.8265791Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T08:57:05.8267810Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T08:57:05.8270069Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T08:57:05.8271684Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T08:57:05.8273330Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T08:57:05.8275532Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T08:57:05.8277327Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T08:57:05.8278824Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T08:57:05.8281393Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T08:57:05.8282693Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T08:57:05.8284280Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T08:57:05.8286419Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T08:57:05.8287994Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T08:57:05.8289641Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T08:57:05.8292238Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T08:57:05.8294021Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T08:57:05.8295599Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T08:57:05.8297916Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T08:57:05.8299496Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T08:57:05.8301166Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T08:57:05.8303394Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T08:57:05.8304960Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T08:57:05.8306529Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T08:57:05.8308793Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T08:57:05.8310364Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T08:57:05.8311920Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T08:57:05.8314612Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T08:57:05.8316242Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T08:57:05.8319938Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T08:57:05.8322213Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T08:57:05.8323844Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T08:57:05.8325432Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T08:57:05.8327672Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T08:57:05.8329331Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T08:57:05.8330968Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T08:57:05.8333127Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T08:57:05.8334718Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T08:57:05.8336530Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T08:57:05.8338531Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T08:57:05.8340035Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T08:57:05.8341566Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T08:57:05.8343870Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T08:57:05.8345388Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T08:57:05.8346924Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T08:57:05.8349480Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T08:57:05.8351013Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T08:57:05.8353349Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T08:57:05.8354973Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T08:57:05.8356565Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T08:57:05.8358626Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T08:57:05.8360381Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T08:57:05.8362028Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T08:57:05.8364367Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T08:57:05.8365986Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T08:57:05.8367583Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T08:57:05.8369985Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T08:57:05.8371669Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T08:57:05.8373311Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T08:57:05.8375547Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T08:57:05.8377590Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T08:57:05.8379193Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T08:57:05.8381706Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T08:57:05.8383381Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T08:57:05.8385014Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T08:57:05.8387258Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T08:57:05.8388917Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T08:57:05.8390646Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T08:57:05.8392735Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T08:57:05.8394472Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T08:57:05.8396057Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T08:57:05.8398120Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T08:57:05.8399729Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T08:57:05.8401455Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T08:57:05.8403845Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T08:57:05.8405478Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T08:57:05.8407081Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T08:57:05.8409298Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T08:57:05.8410982Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T08:57:05.8412760Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T08:57:05.8414927Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T08:57:05.8416566Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T08:57:05.8418383Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T08:57:05.8420570Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T08:57:05.8422150Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T08:57:05.8424300Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T08:57:05.8425881Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T08:57:05.8427424Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T08:57:05.8429657Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T08:57:05.8431323Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T08:57:05.8432928Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T08:57:05.8435523Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T08:57:05.8437275Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T08:57:05.8438904Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T08:57:05.8441366Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T08:57:05.8443004Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T08:57:05.8444668Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T08:57:05.8446874Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T08:57:05.8448466Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T08:57:05.8450012Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T08:57:05.8452321Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T08:57:05.8454237Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T08:57:05.8455809Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T08:57:05.8458433Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T08:57:05.8460522Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T08:57:05.8463514Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T08:57:05.8465008Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T08:57:05.8467197Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T08:57:05.8468896Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T08:57:05.8471048Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T08:57:05.8472684Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T08:57:05.8475513Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T08:57:05.8477129Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T08:57:05.8478723Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T08:57:05.8481129Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T08:57:05.8482607Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T08:57:05.8484179Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T08:57:05.8486287Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T08:57:05.8487843Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T08:57:05.8489321Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T08:57:05.8491444Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T08:57:05.8493063Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T08:57:05.8494737Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T08:57:05.8496841Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T08:57:05.8498469Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T08:57:05.8500012Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T08:57:05.8502059Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T08:57:05.8503609Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T08:57:05.8505209Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T08:57:05.8507345Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T08:57:05.8508913Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T08:57:05.8510544Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T08:57:05.8512680Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T08:57:05.8514268Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T08:57:05.8515969Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T08:57:05.8518347Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T08:57:05.8519961Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T08:57:05.8521622Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T08:57:05.8523757Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T08:57:05.8525313Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T08:57:05.8527003Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T08:57:05.8529102Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T08:57:05.8530779Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T08:57:05.8532280Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T08:57:05.8534433Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T08:57:05.8536055Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T08:57:05.8537680Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T08:57:05.8539741Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T08:57:05.8541402Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T08:57:05.8542958Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T08:57:05.8545187Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T08:57:05.8546658Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T08:57:05.8548197Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T08:57:05.8550230Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T08:57:05.8551813Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T08:57:05.8553446Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T08:57:05.8556011Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T08:57:05.8557571Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T08:57:05.8559270Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T08:57:05.8561570Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T08:57:05.8563116Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T08:57:05.8564749Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T08:57:05.8567433Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T08:57:05.8568995Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T08:57:05.8570654Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T08:57:05.8572803Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T08:57:05.8574442Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T08:57:05.8576032Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T08:57:05.8578218Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T08:57:05.8579804Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T08:57:05.8581449Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T08:57:05.8583752Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T08:57:05.8585344Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T08:57:05.8586941Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T08:57:05.8589102Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T08:57:05.8590730Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T08:57:05.8592359Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T08:57:05.8594585Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T08:57:05.8596165Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T08:57:05.8597813Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T08:57:05.8599982Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T08:57:05.8601602Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T08:57:05.8603232Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T08:57:05.8605477Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T08:57:05.8607031Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T08:57:05.8608644Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T08:57:05.8610961Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T08:57:05.8612302Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T08:57:05.8613838Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T08:57:05.8616013Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T08:57:05.8617763Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T08:57:05.8621192Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T08:57:05.8623690Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T08:57:05.8625306Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T08:57:05.8626926Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T08:57:05.8629122Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T08:57:05.8630657Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T08:57:05.8632252Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T08:57:05.8634696Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T08:57:05.8636285Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T08:57:05.8637947Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T08:57:05.8640183Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T08:57:05.8641804Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T08:57:05.8643463Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T08:57:05.8645648Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T08:57:05.8647171Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T08:57:05.8648858Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T08:57:05.8651201Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T08:57:05.8652691Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T08:57:05.8654348Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T08:57:05.8656878Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T08:57:05.8658435Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T08:57:05.8659982Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T08:57:05.8662199Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T08:57:05.8663783Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T08:57:05.8665393Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T08:57:05.8667445Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T08:57:05.8669094Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T08:57:05.8670652Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T08:57:05.8672807Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T08:57:05.8674593Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T08:57:05.8676047Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T08:57:05.8678192Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T08:57:05.8679706Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T08:57:05.8681512Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T08:57:05.8683551Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T08:57:05.8685139Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T08:57:05.8686745Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T08:57:05.8689389Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T08:57:05.8690960Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T08:57:05.8692521Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T08:57:05.8694690Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T08:57:05.8696760Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T08:57:05.8698354Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T08:57:05.8700517Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T08:57:05.8702153Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T08:57:05.8703711Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T08:57:05.8705810Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T08:57:05.8707434Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T08:57:05.8709098Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T08:57:05.8711370Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T08:57:05.8712910Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T08:57:05.8714479Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T08:57:05.8716786Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T08:57:05.8718582Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T08:57:05.8720170Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T08:57:05.8722309Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T08:57:05.8723904Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T08:57:05.8725459Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T08:57:05.8727633Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T08:57:05.8729248Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T08:57:05.8730848Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T08:57:05.8733066Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T08:57:05.8734605Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T08:57:05.8736234Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T08:57:05.8738207Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T08:57:05.8739931Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T08:57:05.8741381Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T08:57:05.8743344Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T08:57:05.8744914Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T08:57:05.8746471Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T08:57:05.8748602Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T08:57:05.8750632Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T08:57:05.8752286Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T08:57:05.8754459Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T08:57:05.8756231Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T08:57:05.8757794Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T08:57:05.8760368Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T08:57:05.8762004Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T08:57:05.8763707Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T08:57:05.8765880Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T08:57:05.8767492Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T08:57:05.8769690Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T08:57:05.8771842Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T08:57:05.8773401Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T08:57:05.8775101Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T08:57:05.8777369Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T08:57:05.8778998Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T08:57:05.8780561Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T08:57:05.8783077Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T08:57:05.8784855Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T08:57:05.8786936Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T08:57:05.8788515Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T08:57:05.8790023Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T08:57:05.8792116Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T08:57:05.8793768Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T08:57:05.8795378Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T08:57:05.8797973Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T08:57:05.8800075Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T08:57:05.8801791Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T08:57:05.8803872Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T08:57:05.8805509Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T08:57:05.8807136Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T08:57:05.8809734Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T08:57:05.8811248Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T08:57:05.8812841Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T08:57:05.8815329Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T08:57:05.8816779Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T08:57:05.8818783Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T08:57:05.8820868Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T08:57:05.8822666Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T08:57:05.8824115Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T08:57:05.8826224Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T08:57:05.8827830Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T08:57:05.8829389Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T08:57:05.8831624Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T08:57:05.8833154Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T08:57:05.8834759Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T08:57:05.8836824Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T08:57:05.8838422Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T08:57:05.8840169Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T08:57:05.8842323Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T08:57:05.8843906Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T08:57:05.8845516Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T08:57:05.8847652Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T08:57:05.8849301Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T08:57:05.8850966Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T08:57:05.8853164Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T08:57:05.8854636Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T08:57:05.8856183Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T08:57:05.8859080Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T08:57:05.8860363Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T08:57:05.8861946Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T08:57:05.8864079Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T08:57:05.8865682Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T08:57:05.8867336Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T08:57:05.8869519Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T08:57:05.8871243Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T08:57:05.8873075Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T08:57:05.8875708Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T08:57:05.8877265Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T08:57:05.8878845Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T08:57:05.8881150Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T08:57:05.8882733Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T08:57:05.8884308Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T08:57:05.8886408Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T08:57:05.8888034Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T08:57:05.8889632Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T08:57:05.8891743Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T08:57:05.8893353Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T08:57:05.8894926Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T08:57:05.8897509Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T08:57:05.8899099Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T08:57:05.8900846Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T08:57:05.8903008Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T08:57:05.8904546Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T08:57:05.8906169Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T08:57:05.8908314Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T08:57:05.8909816Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T08:57:05.8911438Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T08:57:05.8913525Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T08:57:05.8915144Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T08:57:05.8916701Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T08:57:05.8919050Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T08:57:05.8920643Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T08:57:05.8922290Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T08:57:05.8924499Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T08:57:05.8926124Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T08:57:05.8927774Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T08:57:05.8929880Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T08:57:05.8931501Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T08:57:05.8933000Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T08:57:05.8935142Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T08:57:05.8936852Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T08:57:05.8938311Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T08:57:05.8940388Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T08:57:05.8941937Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T08:57:05.8943513Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T08:57:05.8945690Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T08:57:05.8947257Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T08:57:05.8948920Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T08:57:05.8950983Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T08:57:05.8952606Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T08:57:05.8954174Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T08:57:05.8956286Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T08:57:05.8957865Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T08:57:05.8959414Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T08:57:05.8962107Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T08:57:05.8963694Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T08:57:05.8965256Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T08:57:05.8967512Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T08:57:05.8969112Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T08:57:05.8970825Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T08:57:05.8973002Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T08:57:05.8974540Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T08:57:05.8976125Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T08:57:05.8978312Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T08:57:05.8979906Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T08:57:05.8981475Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T08:57:05.8983737Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T08:57:05.8985281Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T08:57:05.8986894Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T08:57:05.8989119Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T08:57:05.8990700Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T08:57:05.8992258Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T08:57:05.8994418Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T08:57:05.8996032Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T08:57:05.8997558Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T08:57:05.8999962Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T08:57:05.9001543Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T08:57:05.9003189Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T08:57:05.9005292Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T08:57:05.9006861Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T08:57:05.9008999Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T08:57:05.9010654Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T08:57:05.9012243Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T08:57:05.9014434Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T08:57:05.9016013Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T08:57:05.9017775Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T08:57:05.9021471Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T08:57:05.9023064Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T08:57:05.9024589Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T08:57:05.9026757Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T08:57:05.9028367Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T08:57:05.9029956Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T08:57:05.9032130Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T08:57:05.9033803Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T08:57:05.9035369Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T08:57:05.9037633Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T08:57:05.9039220Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T08:57:05.9041118Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T08:57:05.9043315Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T08:57:05.9044872Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T08:57:05.9046451Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T08:57:05.9048596Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T08:57:05.9050143Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T08:57:05.9051746Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T08:57:05.9053949Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T08:57:05.9055618Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T08:57:05.9057191Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T08:57:05.9059496Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T08:57:05.9061061Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T08:57:05.9062638Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T08:57:05.9065019Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T08:57:05.9066383Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T08:57:05.9068080Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T08:57:05.9070181Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T08:57:05.9071796Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T08:57:05.9073382Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T08:57:05.9075464Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T08:57:05.9077166Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T08:57:05.9078777Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T08:57:05.9081187Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T08:57:05.9082768Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T08:57:05.9084293Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T08:57:05.9086405Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T08:57:05.9088056Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T08:57:05.9089455Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T08:57:05.9091635Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T08:57:05.9093237Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T08:57:05.9094767Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T08:57:05.9096969Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T08:57:05.9098685Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T08:57:05.9100209Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T08:57:05.9102911Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T08:57:05.9104512Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T08:57:05.9106112Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T08:57:05.9108501Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T08:57:05.9110609Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T08:57:05.9112969Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T08:57:05.9115846Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T08:57:05.9118307Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T08:57:05.9122191Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T08:57:05.9125099Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T08:57:05.9127851Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T08:57:05.9128572Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T08:57:05.9130845Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T08:57:05.9132793Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T08:57:05.9133910Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T08:57:05.9136143Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T08:57:05.9137742Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T08:57:05.9139342Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T08:57:05.9141501Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T08:57:05.9143045Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T08:57:05.9144648Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T08:57:05.9146895Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T08:57:05.9148883Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T08:57:05.9150450Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T08:57:05.9152561Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T08:57:05.9154169Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T08:57:05.9155826Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T08:57:05.9157947Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T08:57:05.9159688Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T08:57:05.9161402Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T08:57:05.9163545Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T08:57:05.9165263Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T08:57:05.9166878Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T08:57:05.9169073Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T08:57:05.9170604Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T08:57:05.9172187Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T08:57:05.9174309Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T08:57:05.9175904Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T08:57:05.9177523Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T08:57:05.9179692Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T08:57:05.9181294Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T08:57:05.9182896Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T08:57:05.9184998Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T08:57:05.9186617Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T08:57:05.9188237Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T08:57:05.9192047Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T08:57:05.9192915Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T08:57:05.9193914Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T08:57:05.9196726Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T08:57:05.9198315Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T08:57:05.9200013Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T08:57:05.9202371Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T08:57:05.9204371Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T08:57:05.9205909Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T08:57:05.9208107Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T08:57:05.9209841Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T08:57:05.9211429Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T08:57:05.9213681Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T08:57:05.9215426Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T08:57:05.9216920Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T08:57:05.9219282Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T08:57:05.9221175Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T08:57:05.9222634Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T08:57:05.9224796Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T08:57:05.9226388Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T08:57:05.9228068Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T08:57:05.9230144Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T08:57:05.9231706Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T08:57:05.9233316Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T08:57:05.9235587Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T08:57:05.9237206Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T08:57:05.9238840Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T08:57:05.9241278Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T08:57:05.9242771Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T08:57:05.9244320Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T08:57:05.9246510Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T08:57:05.9248187Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T08:57:05.9249680Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T08:57:05.9251861Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T08:57:05.9253606Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T08:57:05.9255389Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T08:57:05.9257343Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T08:57:05.9258898Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T08:57:05.9260550Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T08:57:05.9262773Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T08:57:05.9264373Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T08:57:05.9265947Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T08:57:05.9268209Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T08:57:05.9269776Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T08:57:05.9271355Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T08:57:05.9273581Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T08:57:05.9275190Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T08:57:05.9276803Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T08:57:05.9279432Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T08:57:05.9281175Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T08:57:05.9283230Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T08:57:05.9284826Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T08:57:05.9286407Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T08:57:05.9288439Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T08:57:05.9290116Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T08:57:05.9291784Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T08:57:05.9293796Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T08:57:05.9295381Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T08:57:05.9296948Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T08:57:05.9299572Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T08:57:05.9301565Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T08:57:05.9303664Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T08:57:05.9305824Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T08:57:05.9307926Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T08:57:05.9309986Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T08:57:05.9312628Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T08:57:05.9314220Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T08:57:05.9316897Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T08:57:05.9318874Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T08:57:05.9320959Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T08:57:05.9322486Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T08:57:05.9324073Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T08:57:05.9326208Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T08:57:05.9327851Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T08:57:05.9329870Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T08:57:05.9331381Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T08:57:05.9333528Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T08:57:05.9335719Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T08:57:05.9337313Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T08:57:05.9339508Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T08:57:05.9341142Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T08:57:05.9342727Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T08:57:05.9345234Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T08:57:05.9346825Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T08:57:05.9348442Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T08:57:05.9350519Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T08:57:05.9352234Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T08:57:05.9353741Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T08:57:05.9356121Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T08:57:05.9357690Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T08:57:05.9359225Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T08:57:05.9361475Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T08:57:05.9363025Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T08:57:05.9364583Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T08:57:05.9366708Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T08:57:05.9368988Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T08:57:05.9370587Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T08:57:05.9372777Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T08:57:05.9374356Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T08:57:05.9375998Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T08:57:05.9378432Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T08:57:05.9380162Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T08:57:05.9382165Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T08:57:05.9383692Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T08:57:05.9385790Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T08:57:05.9387284Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T08:57:05.9389233Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T08:57:05.9390872Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T08:57:05.9392896Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T08:57:05.9394446Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T08:57:05.9396464Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T08:57:05.9398007Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T08:57:05.9400018Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T08:57:05.9401646Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T08:57:05.9403700Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T08:57:05.9405217Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T08:57:05.9407230Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T08:57:05.9409323Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T08:57:05.9411412Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T08:57:05.9412955Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T08:57:05.9414938Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T08:57:05.9416501Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T08:57:05.9419909Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T08:57:05.9421442Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T08:57:05.9424090Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T08:57:05.9425723Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T08:57:05.9427766Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T08:57:05.9429298Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T08:57:05.9432101Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T08:57:05.9433731Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T08:57:05.9435294Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T08:57:05.9437256Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T08:57:05.9438895Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T08:57:05.9440639Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T08:57:05.9443297Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T08:57:05.9444864Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T08:57:05.9446458Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T08:57:05.9448581Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T08:57:05.9450102Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T08:57:05.9451994Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T08:57:05.9454239Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T08:57:05.9455785Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T08:57:05.9457756Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T08:57:05.9459313Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T08:57:05.9461574Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T08:57:05.9463118Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T08:57:05.9465099Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T08:57:05.9466707Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T08:57:05.9468868Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T08:57:05.9470497Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T08:57:05.9472127Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T08:57:05.9474313Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T08:57:05.9475939Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T08:57:05.9477473Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T08:57:05.9479699Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T08:57:05.9481554Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T08:57:05.9483107Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T08:57:05.9485330Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T08:57:05.9486899Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T08:57:05.9488499Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T08:57:05.9490597Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T08:57:05.9492192Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T08:57:05.9493780Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T08:57:05.9495915Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T08:57:05.9497497Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T08:57:05.9499573Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T08:57:05.9501701Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T08:57:05.9503438Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T08:57:05.9505003Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T08:57:05.9507037Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T08:57:05.9508772Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T08:57:05.9510425Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T08:57:05.9513007Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T08:57:05.9514619Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T08:57:05.9516316Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T08:57:05.9518804Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T08:57:05.9520211Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T08:57:05.9521859Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T08:57:05.9523988Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T08:57:05.9525628Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T08:57:05.9527236Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T08:57:05.9529336Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T08:57:05.9531031Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T08:57:05.9532584Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T08:57:05.9534856Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T08:57:05.9536389Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T08:57:05.9537935Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T08:57:05.9540457Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T08:57:05.9542080Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T08:57:05.9544183Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T08:57:05.9545838Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T08:57:05.9547626Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T08:57:05.9549842Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T08:57:05.9551393Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T08:57:05.9552964Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T08:57:05.9555094Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T08:57:05.9556660Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T08:57:05.9558218Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T08:57:05.9560285Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T08:57:05.9561990Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T08:57:05.9563494Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T08:57:05.9565654Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T08:57:05.9567318Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T08:57:05.9568902Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T08:57:05.9571161Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T08:57:05.9572666Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T08:57:05.9574237Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T08:57:05.9576322Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T08:57:05.9577892Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T08:57:05.9579536Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T08:57:05.9581756Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T08:57:05.9583300Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T08:57:05.9584901Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T08:57:05.9587170Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T08:57:05.9589293Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T08:57:05.9591686Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T08:57:05.9594737Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T08:57:05.9596963Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T08:57:05.9599160Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T08:57:05.9602317Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T08:57:05.9604440Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T08:57:05.9606847Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T08:57:05.9609449Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T08:57:05.9612126Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T08:57:05.9614320Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T08:57:05.9617360Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T08:57:05.9618997Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T08:57:05.9620651Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T08:57:05.9623316Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T08:57:05.9625278Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T08:57:05.9627305Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T08:57:05.9629514Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T08:57:05.9631183Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T08:57:05.9632770Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T08:57:05.9635528Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T08:57:05.9637128Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T08:57:05.9638746Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T08:57:05.9641164Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T08:57:05.9642794Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T08:57:05.9644545Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T08:57:05.9646774Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T08:57:05.9648411Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T08:57:05.9649965Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T08:57:05.9652135Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T08:57:05.9654125Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T08:57:05.9656051Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T08:57:05.9658243Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T08:57:05.9659851Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T08:57:05.9661528Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T08:57:05.9663704Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T08:57:05.9665353Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T08:57:05.9666906Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T08:57:05.9669041Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T08:57:05.9670585Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T08:57:05.9672692Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T08:57:05.9675299Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T08:57:05.9676885Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T08:57:05.9678448Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T08:57:05.9680832Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T08:57:05.9682338Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T08:57:05.9683909Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T08:57:05.9686126Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T08:57:05.9687805Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T08:57:05.9689409Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T08:57:05.9691626Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T08:57:05.9693191Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T08:57:05.9694762Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T08:57:05.9697015Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T08:57:05.9698582Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T08:57:05.9700100Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T08:57:05.9702799Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T08:57:05.9704410Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T08:57:05.9706107Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T08:57:05.9708695Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T08:57:05.9710237Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T08:57:05.9711797Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T08:57:05.9713982Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T08:57:05.9715487Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T08:57:05.9717215Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T08:57:05.9720062Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T08:57:05.9721932Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T08:57:05.9723214Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T08:57:05.9725938Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T08:57:05.9727500Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T08:57:05.9729206Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T08:57:05.9731196Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T08:57:05.9732795Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T08:57:05.9734349Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T08:57:05.9736481Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T08:57:05.9738031Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T08:57:05.9739502Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T08:57:05.9742005Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T08:57:05.9743275Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T08:57:05.9745107Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T08:57:05.9747453Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T08:57:05.9748820Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T08:57:05.9750641Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T08:57:05.9752815Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T08:57:05.9754197Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T08:57:05.9755902Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T08:57:05.9758027Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T08:57:05.9759516Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T08:57:05.9761321Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T08:57:05.9763319Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T08:57:05.9764984Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T08:57:05.9766597Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T08:57:05.9768600Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T08:57:05.9770212Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T08:57:05.9771816Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T08:57:05.9773951Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T08:57:05.9775485Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T08:57:05.9777009Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T08:57:05.9779033Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T08:57:05.9780669Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T08:57:05.9782196Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T08:57:05.9784581Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T08:57:05.9785804Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T08:57:05.9787579Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T08:57:05.9789640Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T08:57:05.9791238Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T08:57:05.9792866Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T08:57:05.9795014Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T08:57:05.9796563Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T08:57:05.9798250Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T08:57:05.9800486Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T08:57:05.9802160Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T08:57:05.9803698Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T08:57:05.9806134Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T08:57:05.9807703Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T08:57:05.9810615Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T08:57:05.9812309Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T08:57:05.9813869Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T08:57:05.9815982Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T08:57:05.9817854Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T08:57:05.9821181Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T08:57:05.9823778Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T08:57:05.9825473Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T08:57:05.9827063Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T08:57:05.9829164Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T08:57:05.9830832Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T08:57:05.9832429Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T08:57:05.9834919Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T08:57:05.9836639Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T08:57:05.9838272Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T08:57:05.9840533Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T08:57:05.9842180Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T08:57:05.9843717Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T08:57:05.9845964Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T08:57:05.9847612Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T08:57:05.9849247Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T08:57:05.9851729Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T08:57:05.9853217Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T08:57:05.9854734Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T08:57:05.9856863Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T08:57:05.9858882Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T08:57:05.9860435Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T08:57:05.9862569Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T08:57:05.9864053Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T08:57:05.9865617Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T08:57:05.9867822Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T08:57:05.9869627Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T08:57:05.9871285Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T08:57:05.9873478Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T08:57:05.9875530Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T08:57:05.9877335Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T08:57:05.9880126Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T08:57:05.9882337Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T08:57:05.9884005Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T08:57:05.9886132Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T08:57:05.9887854Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T08:57:05.9889616Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T08:57:05.9891927Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T08:57:05.9893457Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T08:57:05.9894987Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T08:57:05.9897287Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T08:57:05.9898981Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T08:57:05.9900540Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T08:57:05.9902723Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T08:57:05.9904356Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T08:57:05.9905999Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T08:57:05.9908584Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T08:57:05.9910253Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T08:57:05.9911816Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T08:57:05.9913877Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T08:57:05.9915562Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T08:57:05.9917624Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T08:57:05.9921016Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T08:57:05.9922939Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T08:57:05.9924494Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T08:57:05.9927158Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T08:57:05.9928676Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T08:57:05.9930283Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T08:57:05.9932401Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T08:57:05.9933965Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T08:57:05.9935517Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T08:57:05.9938147Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T08:57:05.9939831Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T08:57:05.9941384Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T08:57:05.9943570Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T08:57:05.9945602Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T08:57:05.9947287Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T08:57:05.9949365Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T08:57:05.9950935Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T08:57:05.9952534Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T08:57:05.9954666Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T08:57:05.9956290Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T08:57:05.9957849Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T08:57:05.9960079Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T08:57:05.9961706Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T08:57:05.9963261Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T08:57:05.9965363Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T08:57:05.9966976Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T08:57:05.9968568Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T08:57:05.9970608Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T08:57:05.9972199Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T08:57:05.9973790Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T08:57:05.9975961Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T08:57:05.9977547Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T08:57:05.9979145Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T08:57:05.9981869Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T08:57:05.9983622Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T08:57:05.9985163Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T08:57:05.9987307Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T08:57:05.9988969Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T08:57:05.9991165Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T08:57:05.9992786Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T08:57:05.9994399Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T08:57:05.9996566Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T08:57:05.9998114Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T08:57:05.9999708Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T08:57:06.0001993Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T08:57:06.0003570Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T08:57:06.0005650Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T08:57:06.0007209Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T08:57:06.0008794Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T08:57:06.0010961Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T08:57:06.0012543Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T08:57:06.0014128Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T08:57:06.0016329Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T08:57:06.0018179Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T08:57:06.0019717Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T08:57:06.0021885Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T08:57:06.0023475Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T08:57:06.0025051Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T08:57:06.0027165Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T08:57:06.0028736Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T08:57:06.0030283Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T08:57:06.0032456Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T08:57:06.0034466Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T08:57:06.0036054Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T08:57:06.0038207Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T08:57:06.0039772Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T08:57:06.0041505Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T08:57:06.0043656Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T08:57:06.0045243Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T08:57:06.0046946Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T08:57:06.0049040Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T08:57:06.0050502Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T08:57:06.0052079Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T08:57:06.0054262Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T08:57:06.0055828Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T08:57:06.0057388Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T08:57:06.0059503Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T08:57:06.0061859Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T08:57:06.0063470Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T08:57:06.0066111Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T08:57:06.0067772Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T08:57:06.0069407Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T08:57:06.0071642Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T08:57:06.0073288Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T08:57:06.0074920Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T08:57:06.0077147Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T08:57:06.0078714Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T08:57:06.0080352Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T08:57:06.0082523Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T08:57:06.0084177Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T08:57:06.0085759Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T08:57:06.0088052Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T08:57:06.0089833Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T08:57:06.0091280Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T08:57:06.0093602Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T08:57:06.0095301Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T08:57:06.0096814Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T08:57:06.0099132Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T08:57:06.0100709Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T08:57:06.0102408Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T08:57:06.0104659Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T08:57:06.0106330Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T08:57:06.0107915Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T08:57:06.0110149Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T08:57:06.0111868Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T08:57:06.0113329Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T08:57:06.0115920Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T08:57:06.0117823Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T08:57:06.0119403Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T08:57:06.0121820Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T08:57:06.0123364Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T08:57:06.0124963Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T08:57:06.0127209Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T08:57:06.0129029Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T08:57:06.0130587Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T08:57:06.0132797Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T08:57:06.0134399Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T08:57:06.0136012Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T08:57:06.0138195Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T08:57:06.0139794Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T08:57:06.0141352Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T08:57:06.0143718Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T08:57:06.0145343Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T08:57:06.0146923Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T08:57:06.0149082Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T08:57:06.0150641Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T08:57:06.0152256Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T08:57:06.0154454Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T08:57:06.0156159Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T08:57:06.0157733Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T08:57:06.0160057Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T08:57:06.0161718Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T08:57:06.0163298Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T08:57:06.0165555Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T08:57:06.0167160Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T08:57:06.0168761Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T08:57:06.0170872Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T08:57:06.0172442Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T08:57:06.0174053Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T08:57:06.0176456Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T08:57:06.0177690Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T08:57:06.0179522Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T08:57:06.0182588Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T08:57:06.0184184Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T08:57:06.0185851Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T08:57:06.0187965Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T08:57:06.0189498Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T08:57:06.0191057Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T08:57:06.0193286Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T08:57:06.0195280Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T08:57:06.0197273Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T08:57:06.0198898Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T08:57:06.0200999Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T08:57:06.0202306Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T08:57:06.0204897Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T08:57:06.0206208Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T08:57:06.0208033Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T08:57:06.0210311Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T08:57:06.0211809Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T08:57:06.0213310Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T08:57:06.0215530Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T08:57:06.0216975Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T08:57:06.0230438Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T08:57:06.0230963Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T08:57:06.0231382Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T08:57:06.0231789Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T08:57:06.0232211Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T08:57:06.0232610Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T08:57:06.0232997Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T08:57:06.0233389Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T08:57:06.0234836Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T08:57:06.0236851Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T08:57:06.0238404Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T08:57:06.0240166Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T08:57:06.0242340Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T08:57:06.0243889Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T08:57:06.0245457Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T08:57:06.0247640Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T08:57:06.0249463Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T08:57:06.0251052Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T08:57:06.0253374Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T08:57:06.0254894Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T08:57:06.0256480Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T08:57:06.0258652Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T08:57:06.0260209Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T08:57:06.0261863Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T08:57:06.0264316Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T08:57:06.0265916Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T08:57:06.0267518Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T08:57:06.0269795Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T08:57:06.0271510Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T08:57:06.0273102Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T08:57:06.0275271Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T08:57:06.0276887Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T08:57:06.0278893Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T08:57:06.0281621Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T08:57:06.0283621Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T08:57:06.0285321Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T08:57:06.0289322Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T08:57:06.0290796Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T08:57:06.0293460Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T08:57:06.0295053Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T08:57:06.0296624Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T08:57:06.0298716Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T08:57:06.0300268Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T08:57:06.0301906Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T08:57:06.0303950Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T08:57:06.0305500Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T08:57:06.0307099Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T08:57:06.0309717Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T08:57:06.0311792Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T08:57:06.0313359Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T08:57:06.0315018Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T08:57:06.0317701Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T08:57:06.0319394Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T08:57:06.0321098Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T08:57:06.0323381Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T08:57:06.0325022Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T08:57:06.0327609Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T08:57:06.0329193Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T08:57:06.0330769Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T08:57:06.0332881Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T08:57:06.0334545Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T08:57:06.0336282Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T08:57:06.0339036Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T08:57:06.0340406Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T08:57:06.0341871Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T08:57:06.0343925Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T08:57:06.0345475Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T08:57:06.0347091Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T08:57:06.0349188Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T08:57:06.0350776Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T08:57:06.0352368Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T08:57:06.0354439Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T08:57:06.0356058Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T08:57:06.0357636Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T08:57:06.0359797Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T08:57:06.0361525Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T08:57:06.0363071Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T08:57:06.0365079Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T08:57:06.0366674Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T08:57:06.0368352Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T08:57:06.0370449Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T08:57:06.0372021Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T08:57:06.0373768Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T08:57:06.0376170Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T08:57:06.0377765Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T08:57:06.0379321Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T08:57:06.0381449Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T08:57:06.0383033Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T08:57:06.0384739Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T08:57:06.0386859Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T08:57:06.0388405Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T08:57:06.0390562Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T08:57:06.0392663Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T08:57:06.0394240Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T08:57:06.0395924Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T08:57:06.0398452Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T08:57:06.0400228Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T08:57:06.0401882Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T08:57:06.0404067Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T08:57:06.0405638Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T08:57:06.0407259Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T08:57:06.0409405Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T08:57:06.0411036Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T08:57:06.0412628Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T08:57:06.0414768Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T08:57:06.0416387Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T08:57:06.0418273Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T08:57:06.0420298Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T08:57:06.0421925Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T08:57:06.0423580Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T08:57:06.0425780Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T08:57:06.0427290Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T08:57:06.0428870Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T08:57:06.0431036Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T08:57:06.0432609Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T08:57:06.0434148Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T08:57:06.0436344Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T08:57:06.0437954Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T08:57:06.0439658Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T08:57:06.0441881Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T08:57:06.0443481Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T08:57:06.0445095Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T08:57:06.0447264Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T08:57:06.0448845Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T08:57:06.0450409Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T08:57:06.0452534Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T08:57:06.0454229Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T08:57:06.0455832Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T08:57:06.0458018Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T08:57:06.0459698Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T08:57:06.0461271Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T08:57:06.0463362Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T08:57:06.0464937Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T08:57:06.0466520Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T08:57:06.0468699Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T08:57:06.0470234Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T08:57:06.0471869Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T08:57:06.0474114Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T08:57:06.0475672Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T08:57:06.0477289Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T08:57:06.0479477Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T08:57:06.0481256Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T08:57:06.0482907Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T08:57:06.0485072Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T08:57:06.0486714Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T08:57:06.0488349Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T08:57:06.0490532Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T08:57:06.0492088Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T08:57:06.0493637Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T08:57:06.0495908Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T08:57:06.0497476Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T08:57:06.0499095Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T08:57:06.0501360Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T08:57:06.0503130Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T08:57:06.0504561Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T08:57:06.0506753Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T08:57:06.0508531Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T08:57:06.0510100Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T08:57:06.0512100Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T08:57:06.0513688Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T08:57:06.0515278Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T08:57:06.0517678Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T08:57:06.0519260Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T08:57:06.0522062Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T08:57:06.0523638Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T08:57:06.0525178Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T08:57:06.0527953Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T08:57:06.0530608Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T08:57:06.0532190Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T08:57:06.0533797Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T08:57:06.0536352Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T08:57:06.0538000Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T08:57:06.0540016Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T08:57:06.0541585Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T08:57:06.0543555Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T08:57:06.0545169Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T08:57:06.0547213Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T08:57:06.0548809Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T08:57:06.0550753Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T08:57:06.0552766Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T08:57:06.0554761Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T08:57:06.0556283Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T08:57:06.0558273Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T08:57:06.0559779Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T08:57:06.0562677Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T08:57:06.0564242Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T08:57:06.0566340Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T08:57:06.0567882Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T08:57:06.0570122Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T08:57:06.0571540Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T08:57:06.0573577Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T08:57:06.0575080Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T08:57:06.0577200Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T08:57:06.0578735Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T08:57:06.0580885Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T08:57:06.0582532Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T08:57:06.0584144Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T08:57:06.0586421Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T08:57:06.0587940Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T08:57:06.0589603Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T08:57:06.0591863Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T08:57:06.0593364Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T08:57:06.0594983Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T08:57:06.0597202Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T08:57:06.0598791Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T08:57:06.0600493Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T08:57:06.0602721Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T08:57:06.0604291Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T08:57:06.0605928Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T08:57:06.0608116Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T08:57:06.0609703Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T08:57:06.0611223Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T08:57:06.0613435Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T08:57:06.0615012Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T08:57:06.0616568Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T08:57:06.0619318Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T08:57:06.0620888Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T08:57:06.0622514Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T08:57:06.0624767Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T08:57:06.0626461Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T08:57:06.0628151Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T08:57:06.0630720Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T08:57:06.0632723Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T08:57:06.0634333Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T08:57:06.0636289Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T08:57:06.0637881Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T08:57:06.0639491Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T08:57:06.0642330Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T08:57:06.0643908Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T08:57:06.0645506Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T08:57:06.0647567Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T08:57:06.0649185Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T08:57:06.0650657Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T08:57:06.0653050Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T08:57:06.0654704Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T08:57:06.0656333Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T08:57:06.0658477Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T08:57:06.0660128Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T08:57:06.0661768Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T08:57:06.0664053Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T08:57:06.0665640Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T08:57:06.0667246Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T08:57:06.0669511Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T08:57:06.0671205Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T08:57:06.0673014Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T08:57:06.0675418Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T08:57:06.0677297Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T08:57:06.0678873Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T08:57:06.0681634Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T08:57:06.0683173Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T08:57:06.0684799Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T08:57:06.0687112Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T08:57:06.0688696Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T08:57:06.0690471Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T08:57:06.0692581Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T08:57:06.0694022Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T08:57:06.0695641Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T08:57:06.0697891Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T08:57:06.0699594Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T08:57:06.0701282Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T08:57:06.0703605Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T08:57:06.0705171Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T08:57:06.0706862Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T08:57:06.0709070Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T08:57:06.0710666Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T08:57:06.0712212Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T08:57:06.0714483Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T08:57:06.0716089Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T08:57:06.0717703Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T08:57:06.0722248Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T08:57:06.0723702Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T08:57:06.0725299Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T08:57:06.0727526Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T08:57:06.0729367Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T08:57:06.0730960Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T08:57:06.0733038Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T08:57:06.0734616Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T08:57:06.0736216Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T08:57:06.0738451Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T08:57:06.0739960Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T08:57:06.0741534Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T08:57:06.0743806Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T08:57:06.0745439Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T08:57:06.0747038Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T08:57:06.0749665Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T08:57:06.0751921Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T08:57:06.0753744Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T08:57:06.0755763Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T08:57:06.0757356Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T08:57:06.0758991Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T08:57:06.0761332Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T08:57:06.0762920Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T08:57:06.0764592Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T08:57:06.0766918Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T08:57:06.0768458Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T08:57:06.0770035Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T08:57:06.0772086Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T08:57:06.0773692Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T08:57:06.0775245Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T08:57:06.0777294Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T08:57:06.0778891Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T08:57:06.0780393Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T08:57:06.0782550Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T08:57:06.0784115Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T08:57:06.0785647Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T08:57:06.0788006Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T08:57:06.0789597Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T08:57:06.0791221Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T08:57:06.0793950Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T08:57:06.0795515Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T08:57:06.0797092Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T08:57:06.0799751Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T08:57:06.0801528Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T08:57:06.0803261Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T08:57:06.0805525Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T08:57:06.0807019Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T08:57:06.0808590Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T08:57:06.0810792Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T08:57:06.0812484Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T08:57:06.0814037Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T08:57:06.0816118Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T08:57:06.0817878Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T08:57:06.0819537Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T08:57:06.0821793Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T08:57:06.0823376Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T08:57:06.0824895Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T08:57:06.0827150Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T08:57:06.0828658Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T08:57:06.0830229Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T08:57:06.0832567Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T08:57:06.0834327Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T08:57:06.0835949Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T08:57:06.0838237Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T08:57:06.0839830Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T08:57:06.0841575Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T08:57:06.0844091Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T08:57:06.0845648Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T08:57:06.0847200Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T08:57:06.0849416Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T08:57:06.0850948Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T08:57:06.0852584Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T08:57:06.0854553Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T08:57:06.0856151Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T08:57:06.0857799Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T08:57:06.0859788Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T08:57:06.0861326Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T08:57:06.0862901Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T08:57:06.0864958Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T08:57:06.0866557Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T08:57:06.0868119Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T08:57:06.0870124Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T08:57:06.0872214Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T08:57:06.0874041Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T08:57:06.0876021Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T08:57:06.0877757Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T08:57:06.0879582Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T08:57:06.0882000Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T08:57:06.0883061Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T08:57:06.0884958Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T08:57:06.0887018Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T08:57:06.0888576Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T08:57:06.0890217Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T08:57:06.0892401Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T08:57:06.0893983Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T08:57:06.0895785Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T08:57:06.0898009Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T08:57:06.0899597Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T08:57:06.0901228Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T08:57:06.0903232Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T08:57:06.0904780Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T08:57:06.0906341Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T08:57:06.0908430Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T08:57:06.0909972Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T08:57:06.0911561Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T08:57:06.0913731Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T08:57:06.0915352Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T08:57:06.0917130Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T08:57:06.0919301Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T08:57:06.0921006Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T08:57:06.0922516Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T08:57:06.0924978Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T08:57:06.0926635Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T08:57:06.0928248Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T08:57:06.0930506Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T08:57:06.0932041Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T08:57:06.0933520Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T08:57:06.0935670Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T08:57:06.0937242Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T08:57:06.0939227Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T08:57:06.0941593Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T08:57:06.0943230Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T08:57:06.0945895Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T08:57:06.0947713Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T08:57:06.0948681Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T08:57:06.0950382Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T08:57:06.0952617Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T08:57:06.0954256Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T08:57:06.0955896Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T08:57:06.0958052Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T08:57:06.0959529Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T08:57:06.0961407Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T08:57:06.0963972Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T08:57:06.0965588Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T08:57:06.0967117Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T08:57:06.0969275Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T08:57:06.0970948Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T08:57:06.0972505Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T08:57:06.0974655Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T08:57:06.0976217Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T08:57:06.0977817Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T08:57:06.0979935Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T08:57:06.0981540Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T08:57:06.0983167Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T08:57:06.0985401Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T08:57:06.0986998Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T08:57:06.0988561Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T08:57:06.0990699Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T08:57:06.0992393Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T08:57:06.0993949Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T08:57:06.0996137Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T08:57:06.0997715Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T08:57:06.0999332Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T08:57:06.1001779Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T08:57:06.1003329Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T08:57:06.1005072Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T08:57:06.1007175Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T08:57:06.1008814Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T08:57:06.1010245Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T08:57:06.1012905Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T08:57:06.1014583Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T08:57:06.1017334Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T08:57:06.1020769Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T08:57:06.1022374Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T08:57:06.1025086Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T08:57:06.1026683Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T08:57:06.1028388Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T08:57:06.1030498Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T08:57:06.1032089Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T08:57:06.1033686Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T08:57:06.1035726Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T08:57:06.1037325Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T08:57:06.1039034Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T08:57:06.1041364Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T08:57:06.1042915Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T08:57:06.1044531Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T08:57:06.1046802Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T08:57:06.1048502Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T08:57:06.1050237Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T08:57:06.1052342Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T08:57:06.1054035Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T08:57:06.1055517Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T08:57:06.1057596Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T08:57:06.1059214Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T08:57:06.1060690Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T08:57:06.1062749Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T08:57:06.1064320Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T08:57:06.1065998Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T08:57:06.1068199Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T08:57:06.1069772Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T08:57:06.1071608Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T08:57:06.1073933Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T08:57:06.1075486Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T08:57:06.1077051Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T08:57:06.1079207Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T08:57:06.1081391Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T08:57:06.1082878Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T08:57:06.1084895Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T08:57:06.1086526Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T08:57:06.1088084Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T08:57:06.1090153Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T08:57:06.1091736Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T08:57:06.1093279Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T08:57:06.1095410Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T08:57:06.1097140Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T08:57:06.1098706Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T08:57:06.1101324Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T08:57:06.1102959Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T08:57:06.1104543Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T08:57:06.1106566Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T08:57:06.1108153Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T08:57:06.1109773Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T08:57:06.1111933Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T08:57:06.1113524Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T08:57:06.1115131Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T08:57:06.1117518Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T08:57:06.1119255Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T08:57:06.1120943Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T08:57:06.1122987Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T08:57:06.1124550Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T08:57:06.1126150Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T08:57:06.1128322Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T08:57:06.1129915Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T08:57:06.1131465Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T08:57:06.1133623Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T08:57:06.1135372Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T08:57:06.1136803Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T08:57:06.1138959Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T08:57:06.1140493Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T08:57:06.1142086Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T08:57:06.1144174Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T08:57:06.1145781Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T08:57:06.1147308Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T08:57:06.1149933Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T08:57:06.1151589Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T08:57:06.1153652Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T08:57:06.1155761Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T08:57:06.1157343Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T08:57:06.1158900Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T08:57:06.1161433Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T08:57:06.1162896Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T08:57:06.1164463Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T08:57:06.1166565Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T08:57:06.1168170Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T08:57:06.1169603Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T08:57:06.1171678Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T08:57:06.1173262Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T08:57:06.1174963Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T08:57:06.1177052Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T08:57:06.1179008Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T08:57:06.1180240Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T08:57:06.1182673Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T08:57:06.1183864Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T08:57:06.1185739Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T08:57:06.1187813Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T08:57:06.1189257Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T08:57:06.1190909Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T08:57:06.1193018Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T08:57:06.1194648Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T08:57:06.1196369Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T08:57:06.1198675Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T08:57:06.1200692Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T08:57:06.1202141Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T08:57:06.1204169Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T08:57:06.1205705Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T08:57:06.1207299Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T08:57:06.1209326Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T08:57:06.1210988Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T08:57:06.1212522Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T08:57:06.1215057Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T08:57:06.1216645Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T08:57:06.1218587Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T08:57:06.1220566Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T08:57:06.1222176Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T08:57:06.1223708Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T08:57:06.1225737Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T08:57:06.1227309Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T08:57:06.1228909Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T08:57:06.1230961Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T08:57:06.1232508Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T08:57:06.1234118Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T08:57:06.1236340Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T08:57:06.1237834Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T08:57:06.1239505Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T08:57:06.1241860Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T08:57:06.1243412Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T08:57:06.1244912Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T08:57:06.1247474Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T08:57:06.1249051Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T08:57:06.1251131Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T08:57:06.1253698Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T08:57:06.1255327Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T08:57:06.1256885Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T08:57:06.1259154Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T08:57:06.1260762Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T08:57:06.1262444Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T08:57:06.1265273Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T08:57:06.1268192Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T08:57:06.1269973Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T08:57:06.1271578Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T08:57:06.1273734Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T08:57:06.1275395Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T08:57:06.1276967Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T08:57:06.1279030Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T08:57:06.1280832Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T08:57:06.1282432Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T08:57:06.1284578Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T08:57:06.1286172Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T08:57:06.1287774Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T08:57:06.1289910Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T08:57:06.1291690Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T08:57:06.1293269Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T08:57:06.1295334Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T08:57:06.1296908Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T08:57:06.1298497Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T08:57:06.1300491Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T08:57:06.1302045Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T08:57:06.1303642Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T08:57:06.1305837Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T08:57:06.1307411Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T08:57:06.1309005Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T08:57:06.1311188Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T08:57:06.1312760Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T08:57:06.1314333Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T08:57:06.1316450Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T08:57:06.1318221Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T08:57:06.1319788Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T08:57:06.1322015Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T08:57:06.1323600Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T08:57:06.1325161Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T08:57:06.1327361Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T08:57:06.1328932Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T08:57:06.1330689Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T08:57:06.1332671Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T08:57:06.1334206Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T08:57:06.1335762Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T08:57:06.1337899Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T08:57:06.1339448Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T08:57:06.1341062Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T08:57:06.1343220Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T08:57:06.1344818Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T08:57:06.1346382Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T08:57:06.1348619Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T08:57:06.1350166Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T08:57:06.1351745Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T08:57:06.1354366Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T08:57:06.1355966Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T08:57:06.1357572Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T08:57:06.1359857Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T08:57:06.1361707Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T08:57:06.1363843Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T08:57:06.1366763Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T08:57:06.1368342Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T08:57:06.1369836Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T08:57:06.1372078Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T08:57:06.1373717Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T08:57:06.1375807Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T08:57:06.1377983Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T08:57:06.1379613Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T08:57:06.1381646Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T08:57:06.1383851Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T08:57:06.1385428Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T08:57:06.1386993Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T08:57:06.1389158Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T08:57:06.1390698Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T08:57:06.1392239Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T08:57:06.1394900Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T08:57:06.1396922Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T08:57:06.1398671Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T08:57:06.1401154Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T08:57:06.1402792Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T08:57:06.1404491Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T08:57:06.1407008Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T08:57:06.1408560Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T08:57:06.1410666Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T08:57:06.1412243Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T08:57:06.1413809Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T08:57:06.1416103Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T08:57:06.1420025Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T08:57:06.1421666Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T08:57:06.1424318Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T08:57:06.1425840Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T08:57:06.1427479Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T08:57:06.1429733Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T08:57:06.1431335Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T08:57:06.1432914Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T08:57:06.1434878Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T08:57:06.1436470Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T08:57:06.1437994Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T08:57:06.1440010Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T08:57:06.1441677Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T08:57:06.1443273Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T08:57:06.1445624Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T08:57:06.1447427Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T08:57:06.1449269Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T08:57:06.1451540Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T08:57:06.1453151Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T08:57:06.1454787Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T08:57:06.1457340Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T08:57:06.1458951Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T08:57:06.1461159Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T08:57:06.1462747Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T08:57:06.1464324Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T08:57:06.1466408Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T08:57:06.1468272Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T08:57:06.1469646Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T08:57:06.1471709Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T08:57:06.1473296Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T08:57:06.1474873Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T08:57:06.1476957Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T08:57:06.1478494Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T08:57:06.1480171Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T08:57:06.1482418Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T08:57:06.1483990Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T08:57:06.1485552Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T08:57:06.1487675Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T08:57:06.1489336Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T08:57:06.1490992Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T08:57:06.1493085Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T08:57:06.1494654Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T08:57:06.1496266Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T08:57:06.1498829Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T08:57:06.1500279Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T08:57:06.1501867Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T08:57:06.1504029Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T08:57:06.1505650Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T08:57:06.1507297Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T08:57:06.1509362Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T08:57:06.1511044Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T08:57:06.1512531Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T08:57:06.1514611Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T08:57:06.1516251Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T08:57:06.1517986Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T08:57:06.1520164Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T08:57:06.1522235Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T08:57:06.1523780Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T08:57:06.1525945Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T08:57:06.1527509Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T08:57:06.1529190Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T08:57:06.1531222Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T08:57:06.1532828Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T08:57:06.1534625Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T08:57:06.1536553Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T08:57:06.1538084Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T08:57:06.1539557Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T08:57:06.1542146Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T08:57:06.1543790Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T08:57:06.1545443Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T08:57:06.1547535Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T08:57:06.1549161Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T08:57:06.1551180Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T08:57:06.1553227Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T08:57:06.1554804Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T08:57:06.1556406Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T08:57:06.1558574Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T08:57:06.1560202Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T08:57:06.1561866Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T08:57:06.1563971Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T08:57:06.1565559Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T08:57:06.1567210Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T08:57:06.1569326Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T08:57:06.1571016Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T08:57:06.1572635Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T08:57:06.1574709Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T08:57:06.1576272Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T08:57:06.1577852Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T08:57:06.1580920Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T08:57:06.1582464Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T08:57:06.1584540Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T08:57:06.1586125Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T08:57:06.1588307Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T08:57:06.1589831Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T08:57:06.1591524Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T08:57:06.1593554Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T08:57:06.1595121Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T08:57:06.1596818Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T08:57:06.1598785Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T08:57:06.1600361Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T08:57:06.1602018Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T08:57:06.1604072Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T08:57:06.1605663Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T08:57:06.1607701Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T08:57:06.1610092Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T08:57:06.1611748Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T08:57:06.1613310Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T08:57:06.1615404Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T08:57:06.1616929Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T08:57:06.1618728Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T08:57:06.1620645Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T08:57:06.1622406Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T08:57:06.1623966Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T08:57:06.1626176Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T08:57:06.1627715Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T08:57:06.1629310Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T08:57:06.1631761Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T08:57:06.1633290Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T08:57:06.1635376Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T08:57:06.1636956Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T08:57:06.1638541Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T08:57:06.1640697Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T08:57:06.1642248Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T08:57:06.1643985Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T08:57:06.1646033Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T08:57:06.1647701Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T08:57:06.1649288Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T08:57:06.1651594Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T08:57:06.1653586Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T08:57:06.1655214Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T08:57:06.1657255Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T08:57:06.1658833Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T08:57:06.1660411Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T08:57:06.1662749Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T08:57:06.1664162Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T08:57:06.1665764Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T08:57:06.1667820Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T08:57:06.1669432Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T08:57:06.1672628Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T08:57:06.1674242Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T08:57:06.1675825Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T08:57:06.1677935Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T08:57:06.1679623Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T08:57:06.1681449Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T08:57:06.1683554Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T08:57:06.1685100Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T08:57:06.1687149Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T08:57:06.1689139Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T08:57:06.1690857Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T08:57:06.1692432Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T08:57:06.1695251Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T08:57:06.1696725Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T08:57:06.1698253Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T08:57:06.1700228Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T08:57:06.1701765Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T08:57:06.1703328Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T08:57:06.1705442Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T08:57:06.1707010Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T08:57:06.1708605Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T08:57:06.1711274Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T08:57:06.1712829Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T08:57:06.1714434Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T08:57:06.1717166Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T08:57:06.1718852Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T08:57:06.1720504Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T08:57:06.1722657Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T08:57:06.1724261Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T08:57:06.1725832Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T08:57:06.1728018Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T08:57:06.1729528Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T08:57:06.1731077Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T08:57:06.1733137Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T08:57:06.1734701Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T08:57:06.1736318Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T08:57:06.1738923Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T08:57:06.1740487Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T08:57:06.1742069Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T08:57:06.1744145Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T08:57:06.1745772Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T08:57:06.1747340Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T08:57:06.1749409Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T08:57:06.1751108Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T08:57:06.1752667Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T08:57:06.1754795Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T08:57:06.1756385Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T08:57:06.1757982Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T08:57:06.1760077Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T08:57:06.1761817Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T08:57:06.1763467Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T08:57:06.1765572Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T08:57:06.1767152Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T08:57:06.1769134Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T08:57:06.1771296Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T08:57:06.1772895Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T08:57:06.1774493Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T08:57:06.1776625Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T08:57:06.1778200Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T08:57:06.1779834Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T08:57:06.1782595Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T08:57:06.1784308Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T08:57:06.1785991Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T08:57:06.1788306Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T08:57:06.1790180Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T08:57:06.1791694Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T08:57:06.1793827Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T08:57:06.1795499Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T08:57:06.1797549Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T08:57:06.1799843Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T08:57:06.1801492Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T08:57:06.1803107Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T08:57:06.1805284Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T08:57:06.1806857Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T08:57:06.1808426Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T08:57:06.1810762Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T08:57:06.1812328Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T08:57:06.1813931Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T08:57:06.1816178Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T08:57:06.1819124Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T08:57:06.1820698Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T08:57:06.1822691Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T08:57:06.1824354Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T08:57:06.1826024Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T08:57:06.1828178Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T08:57:06.1829811Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T08:57:06.1831473Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T08:57:06.1833681Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T08:57:06.1835382Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T08:57:06.1837175Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T08:57:06.1839311Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T08:57:06.1841228Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T08:57:06.1842808Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T08:57:06.1845052Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T08:57:06.1846755Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T08:57:06.1848418Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T08:57:06.1850497Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T08:57:06.1852489Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T08:57:06.1854313Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T08:57:06.1856347Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T08:57:06.1857847Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T08:57:06.1859366Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T08:57:06.1861478Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T08:57:06.1863183Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T08:57:06.1864753Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T08:57:06.1867058Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T08:57:06.1868792Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T08:57:06.1870428Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T08:57:06.1873006Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T08:57:06.1875095Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T08:57:06.1876726Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T08:57:06.1878993Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T08:57:06.1880743Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T08:57:06.1882304Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T08:57:06.1884944Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T08:57:06.1886600Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T08:57:06.1888560Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T08:57:06.1890083Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T08:57:06.1892143Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T08:57:06.1893707Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T08:57:06.1895696Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T08:57:06.1897239Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T08:57:06.1899816Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T08:57:06.1902073Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T08:57:06.1903715Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T08:57:06.1905888Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T08:57:06.1907456Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T08:57:06.1909182Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T08:57:06.1911362Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T08:57:06.1912963Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T08:57:06.1914574Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T08:57:06.1916822Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T08:57:06.1918620Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T08:57:06.1920331Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T08:57:06.1922414Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T08:57:06.1923980Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T08:57:06.1925561Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T08:57:06.1927679Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T08:57:06.1929402Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T08:57:06.1931102Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T08:57:06.1933254Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T08:57:06.1934890Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T08:57:06.1936874Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T08:57:06.1938370Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T08:57:06.1941469Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T08:57:06.1943038Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T08:57:06.1944687Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T08:57:06.1946867Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T08:57:06.1948450Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T08:57:06.1950060Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T08:57:06.1952442Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T08:57:06.1954082Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T08:57:06.1955753Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T08:57:06.1958012Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T08:57:06.1959648Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T08:57:06.1961394Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T08:57:06.1963639Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T08:57:06.1965325Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T08:57:06.1966892Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T08:57:06.1968998Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T08:57:06.1970708Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T08:57:06.1972313Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T08:57:06.1974508Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T08:57:06.1976187Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T08:57:06.1977778Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T08:57:06.1979873Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T08:57:06.1981413Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T08:57:06.1983031Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T08:57:06.1985322Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T08:57:06.1986817Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T08:57:06.1988360Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T08:57:06.1990625Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T08:57:06.1992145Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T08:57:06.1993709Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T08:57:06.1995895Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T08:57:06.1997514Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T08:57:06.1999061Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T08:57:06.2001952Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T08:57:06.2003457Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T08:57:06.2005015Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T08:57:06.2007294Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T08:57:06.2008876Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T08:57:06.2010398Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T08:57:06.2012422Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T08:57:06.2013942Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T08:57:06.2015532Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T08:57:06.2017929Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T08:57:06.2019868Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T08:57:06.2021823Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T08:57:06.2023860Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T08:57:06.2025573Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T08:57:06.2027176Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T08:57:06.2029925Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T08:57:06.2031502Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T08:57:06.2033136Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T08:57:06.2035664Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T08:57:06.2037347Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T08:57:06.2039044Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T08:57:06.2041429Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T08:57:06.2043059Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T08:57:06.2044643Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T08:57:06.2046779Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T08:57:06.2048328Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T08:57:06.2049942Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T08:57:06.2052017Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T08:57:06.2053548Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T08:57:06.2055122Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T08:57:06.2057297Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T08:57:06.2058899Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T08:57:06.2060514Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T08:57:06.2062731Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T08:57:06.2064389Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T08:57:06.2065953Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T08:57:06.2068174Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T08:57:06.2069726Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T08:57:06.2071316Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T08:57:06.2073483Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T08:57:06.2075107Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T08:57:06.2076655Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T08:57:06.2078795Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T08:57:06.2080448Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T08:57:06.2082002Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T08:57:06.2084175Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T08:57:06.2085816Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T08:57:06.2087378Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T08:57:06.2089677Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T08:57:06.2091301Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T08:57:06.2092887Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T08:57:06.2095033Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T08:57:06.2096613Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T08:57:06.2098168Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T08:57:06.2100652Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T08:57:06.2102938Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T08:57:06.2104526Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T08:57:06.2106176Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T08:57:06.2108338Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T08:57:06.2109869Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T08:57:06.2111442Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T08:57:06.2113665Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T08:57:06.2115176Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T08:57:06.2116728Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T08:57:06.2119230Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T08:57:06.2120825Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T08:57:06.2122435Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T08:57:06.2124463Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T08:57:06.2126028Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T08:57:06.2127722Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T08:57:06.2129861Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T08:57:06.2131438Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T08:57:06.2133027Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T08:57:06.2135166Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T08:57:06.2136778Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T08:57:06.2138397Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T08:57:06.2140557Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T08:57:06.2142293Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T08:57:06.2143890Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T08:57:06.2145991Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T08:57:06.2147576Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T08:57:06.2149316Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T08:57:06.2151528Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T08:57:06.2153159Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T08:57:06.2154645Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T08:57:06.2156975Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T08:57:06.2158624Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T08:57:06.2160188Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T08:57:06.2162429Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T08:57:06.2163892Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T08:57:06.2165461Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T08:57:06.2167701Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T08:57:06.2169732Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T08:57:06.2171469Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T08:57:06.2173685Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T08:57:06.2175295Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T08:57:06.2177011Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T08:57:06.2179058Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T08:57:06.2180754Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T08:57:06.2182329Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T08:57:06.2185030Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T08:57:06.2186618Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T08:57:06.2188284Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T08:57:06.2190400Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T08:57:06.2192103Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T08:57:06.2193754Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T08:57:06.2195916Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T08:57:06.2197539Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T08:57:06.2199236Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T08:57:06.2201538Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T08:57:06.2203114Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T08:57:06.2204711Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T08:57:06.2206959Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T08:57:06.2208514Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T08:57:06.2209986Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T08:57:06.2212221Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T08:57:06.2213817Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T08:57:06.2215545Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T08:57:06.2220009Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T08:57:06.2221629Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T08:57:06.2223233Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T08:57:06.2225793Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T08:57:06.2227387Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T08:57:06.2228988Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T08:57:06.2231074Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T08:57:06.2232768Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T08:57:06.2234798Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T08:57:06.2236394Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T08:57:06.2237990Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T08:57:06.2241053Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T08:57:06.2242662Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T08:57:06.2244440Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T08:57:06.2246570Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T08:57:06.2248124Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T08:57:06.2249664Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T08:57:06.2251879Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T08:57:06.2253401Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T08:57:06.2255042Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T08:57:06.2257281Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T08:57:06.2258907Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T08:57:06.2260432Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T08:57:06.2262757Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T08:57:06.2264409Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T08:57:06.2265985Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T08:57:06.2268083Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T08:57:06.2269686Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T08:57:06.2271301Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T08:57:06.2273501Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T08:57:06.2275114Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T08:57:06.2276744Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T08:57:06.2278922Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T08:57:06.2280617Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T08:57:06.2282202Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T08:57:06.2284322Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T08:57:06.2285862Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T08:57:06.2287436Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T08:57:06.2289592Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T08:57:06.2291095Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T08:57:06.2292658Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T08:57:06.2294676Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T08:57:06.2296409Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T08:57:06.2297954Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T08:57:06.2300138Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T08:57:06.2301742Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T08:57:06.2303458Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T08:57:06.2305368Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T08:57:06.2306962Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T08:57:06.2308579Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T08:57:06.2310724Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T08:57:06.2312283Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T08:57:06.2313866Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T08:57:06.2316098Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T08:57:06.2317961Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T08:57:06.2319618Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T08:57:06.2322162Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T08:57:06.2323751Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T08:57:06.2325266Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T08:57:06.2327347Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T08:57:06.2328841Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T08:57:06.2330577Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T08:57:06.2332540Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T08:57:06.2334186Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T08:57:06.2335765Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T08:57:06.2338409Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T08:57:06.2339916Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T08:57:06.2341455Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T08:57:06.2343815Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T08:57:06.2345393Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T08:57:06.2347048Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T08:57:06.2349199Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T08:57:06.2350765Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T08:57:06.2352398Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T08:57:06.2354706Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T08:57:06.2356281Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T08:57:06.2357868Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T08:57:06.2360171Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T08:57:06.2361801Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T08:57:06.2363399Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T08:57:06.2366041Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T08:57:06.2367644Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T08:57:06.2369197Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T08:57:06.2371753Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T08:57:06.2373627Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T08:57:06.2375324Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T08:57:06.2377556Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T08:57:06.2379180Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T08:57:06.2380754Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T08:57:06.2383267Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T08:57:06.2384854Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T08:57:06.2386491Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T08:57:06.2389384Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T08:57:06.2404259Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T08:57:06.2404465Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T08:57:06.2404641Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T08:57:06.2404821Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T08:57:06.2404996Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T08:57:06.2405172Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T08:57:06.2405470Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T08:57:06.2405767Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T08:57:06.2405946Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T08:57:06.2407665Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T08:57:06.2409372Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T08:57:06.2411737Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T08:57:06.2413534Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T08:57:06.2415088Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T08:57:06.2417415Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T08:57:06.2420371Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T08:57:06.2421906Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T08:57:06.2424091Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T08:57:06.2425632Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T08:57:06.2427253Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T08:57:06.2429512Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T08:57:06.2430945Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T08:57:06.2432453Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T08:57:06.2435128Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T08:57:06.2436646Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T08:57:06.2438221Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T08:57:06.2441424Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T08:57:06.2443184Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T08:57:06.2444771Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T08:57:06.2446839Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T08:57:06.2448526Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T08:57:06.2450239Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T08:57:06.2452231Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T08:57:06.2453835Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T08:57:06.2455409Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T08:57:06.2457573Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T08:57:06.2459117Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T08:57:06.2460774Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T08:57:06.2462883Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T08:57:06.2464552Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T08:57:06.2466089Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T08:57:06.2468492Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T08:57:06.2470493Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T08:57:06.2472101Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T08:57:06.2474355Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T08:57:06.2475884Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T08:57:06.2477604Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T08:57:06.2480011Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T08:57:06.2481738Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T08:57:06.2483305Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T08:57:06.2485481Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T08:57:06.2486934Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T08:57:06.2488485Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T08:57:06.2490660Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T08:57:06.2492390Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T08:57:06.2493822Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T08:57:06.2496494Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T08:57:06.2498096Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T08:57:06.2499715Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T08:57:06.2502026Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T08:57:06.2503680Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T08:57:06.2505222Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T08:57:06.2507935Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T08:57:06.2509644Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T08:57:06.2511220Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T08:57:06.2513884Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T08:57:06.2515370Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T08:57:06.2516885Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T08:57:06.2519120Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T08:57:06.2520794Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T08:57:06.2522394Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T08:57:06.2524570Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T08:57:06.2526182Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T08:57:06.2527767Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T08:57:06.2529934Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T08:57:06.2531491Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T08:57:06.2533047Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T08:57:06.2535199Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T08:57:06.2536852Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T08:57:06.2538447Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T08:57:06.2540496Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T08:57:06.2542095Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T08:57:06.2543690Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T08:57:06.2546372Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T08:57:06.2548046Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T08:57:06.2550136Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T08:57:06.2551764Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T08:57:06.2553369Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T08:57:06.2555537Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T08:57:06.2557343Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T08:57:06.2558858Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T08:57:06.2560903Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T08:57:06.2562451Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T08:57:06.2564041Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T08:57:06.2566623Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T08:57:06.2568714Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T08:57:06.2570812Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T08:57:06.2573407Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T08:57:06.2575467Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T08:57:06.2577097Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T08:57:06.2579195Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T08:57:06.2580919Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T08:57:06.2582524Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T08:57:06.2584701Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T08:57:06.2586328Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T08:57:06.2587940Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T08:57:06.2590125Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T08:57:06.2591699Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T08:57:06.2593724Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T08:57:06.2595885Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T08:57:06.2597489Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T08:57:06.2599222Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T08:57:06.2601467Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T08:57:06.2603057Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T08:57:06.2604704Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T08:57:06.2606793Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T08:57:06.2608449Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T08:57:06.2609952Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T08:57:06.2611925Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T08:57:06.2613598Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T08:57:06.2615194Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T08:57:06.2617569Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T08:57:06.2619115Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T08:57:06.2620624Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T08:57:06.2622741Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T08:57:06.2624385Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T08:57:06.2626192Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T08:57:06.2628314Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T08:57:06.2629969Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T08:57:06.2632893Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T08:57:06.2634319Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T08:57:06.2635987Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T08:57:06.2637483Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T08:57:06.2639798Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T08:57:06.2641741Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T08:57:06.2643442Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T08:57:06.2645514Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T08:57:06.2647071Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T08:57:06.2648723Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T08:57:06.2650922Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T08:57:06.2652551Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T08:57:06.2654133Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T08:57:06.2656180Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T08:57:06.2657806Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T08:57:06.2659378Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T08:57:06.2662054Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T08:57:06.2663760Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T08:57:06.2665461Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T08:57:06.2667713Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T08:57:06.2669268Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T08:57:06.2670821Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T08:57:06.2672967Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T08:57:06.2674617Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T08:57:06.2676266Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T08:57:06.2678495Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T08:57:06.2680158Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T08:57:06.2681801Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T08:57:06.2683975Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T08:57:06.2685623Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T08:57:06.2687380Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T08:57:06.2689479Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T08:57:06.2691071Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T08:57:06.2692987Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T08:57:06.2695335Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T08:57:06.2696852Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T08:57:06.2698339Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T08:57:06.2700921Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T08:57:06.2702586Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T08:57:06.2704216Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T08:57:06.2706254Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T08:57:06.2707816Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T08:57:06.2709368Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T08:57:06.2711977Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T08:57:06.2713509Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T08:57:06.2715214Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T08:57:06.2717730Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T08:57:06.2719268Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T08:57:06.2721066Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T08:57:06.2723239Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T08:57:06.2724844Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T08:57:06.2726566Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T08:57:06.2729221Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T08:57:06.2730815Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T08:57:06.2732804Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T08:57:06.2735043Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T08:57:06.2736776Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T08:57:06.2738353Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T08:57:06.2740497Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T08:57:06.2742093Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T08:57:06.2743706Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T08:57:06.2745875Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T08:57:06.2747504Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T08:57:06.2749126Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T08:57:06.2751335Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T08:57:06.2753134Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T08:57:06.2754510Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T08:57:06.2756833Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T08:57:06.2758565Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T08:57:06.2760176Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T08:57:06.2762332Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T08:57:06.2763884Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T08:57:06.2765547Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T08:57:06.2767705Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T08:57:06.2769439Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T08:57:06.2771018Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T08:57:06.2773235Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T08:57:06.2774861Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T08:57:06.2776437Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T08:57:06.2778515Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T08:57:06.2780158Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T08:57:06.2781743Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T08:57:06.2784899Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T08:57:06.2786475Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T08:57:06.2788051Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T08:57:06.2790210Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T08:57:06.2791804Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T08:57:06.2793396Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T08:57:06.2795641Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T08:57:06.2797257Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T08:57:06.2798900Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T08:57:06.2801210Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T08:57:06.2803343Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T08:57:06.2804949Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T08:57:06.2807203Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T08:57:06.2808776Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T08:57:06.2810277Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T08:57:06.2812471Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T08:57:06.2814096Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T08:57:06.2815775Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T08:57:06.2819958Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T08:57:06.2821569Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T08:57:06.2823296Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T08:57:06.2825527Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T08:57:06.2827206Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T08:57:06.2828793Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T08:57:06.2831018Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T08:57:06.2832705Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T08:57:06.2834312Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T08:57:06.2836424Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T08:57:06.2838048Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T08:57:06.2839603Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T08:57:06.2842104Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T08:57:06.2843675Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T08:57:06.2845221Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T08:57:06.2847404Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T08:57:06.2849180Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T08:57:06.2850778Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T08:57:06.2853522Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T08:57:06.2858080Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T08:57:06.2859788Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T08:57:06.2862027Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T08:57:06.2863555Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T08:57:06.2865098Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T08:57:06.2867317Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T08:57:06.2868885Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T08:57:06.2870369Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T08:57:06.2872629Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T08:57:06.2874298Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T08:57:06.2875902Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T08:57:06.2878106Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T08:57:06.2879700Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T08:57:06.2881419Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T08:57:06.2883858Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T08:57:06.2885274Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T08:57:06.2886756Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T08:57:06.2889038Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T08:57:06.2890646Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T08:57:06.2892651Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T08:57:06.2894818Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T08:57:06.2896944Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T08:57:06.2898553Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T08:57:06.2900779Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T08:57:06.2902371Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T08:57:06.2903932Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T08:57:06.2906156Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T08:57:06.2907755Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T08:57:06.2909412Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T08:57:06.2911727Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T08:57:06.2913319Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T08:57:06.2914912Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T08:57:06.2917255Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T08:57:06.2919106Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T08:57:06.2920706Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T08:57:06.2922988Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T08:57:06.2924521Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T08:57:06.2926155Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T08:57:06.2928240Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T08:57:06.2929807Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T08:57:06.2931584Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T08:57:06.2933724Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T08:57:06.2935279Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T08:57:06.2936832Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T08:57:06.2939085Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T08:57:06.2940588Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T08:57:06.2942178Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T08:57:06.2944575Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T08:57:06.2946143Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T08:57:06.2947802Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T08:57:06.2949934Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T08:57:06.2951587Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T08:57:06.2953123Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T08:57:06.2955404Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T08:57:06.2957038Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T08:57:06.2958604Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T08:57:06.2960812Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T08:57:06.2962534Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T08:57:06.2964186Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T08:57:06.2966377Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T08:57:06.2967958Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T08:57:06.2969576Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T08:57:06.2971754Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T08:57:06.2973328Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T08:57:06.2974896Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T08:57:06.2977108Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T08:57:06.2978746Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T08:57:06.2980290Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T08:57:06.2982499Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T08:57:06.2984267Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T08:57:06.2985885Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T08:57:06.2988475Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T08:57:06.2990147Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T08:57:06.2992179Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T08:57:06.2993684Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T08:57:06.2995780Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T08:57:06.2997368Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T08:57:06.2998978Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T08:57:06.3001619Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T08:57:06.3003166Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T08:57:06.3004849Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T08:57:06.3006887Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T08:57:06.3008562Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T08:57:06.3010015Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T08:57:06.3012119Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T08:57:06.3013670Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T08:57:06.3015238Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T08:57:06.3017995Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T08:57:06.3019657Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T08:57:06.3021193Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T08:57:06.3023287Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T08:57:06.3024878Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T08:57:06.3026577Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T08:57:06.3028642Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T08:57:06.3030257Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T08:57:06.3031810Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T08:57:06.3033910Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T08:57:06.3035478Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T08:57:06.3037097Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T08:57:06.3039276Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T08:57:06.3040964Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T08:57:06.3042598Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T08:57:06.3045157Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T08:57:06.3046761Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T08:57:06.3048354Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T08:57:06.3050474Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T08:57:06.3052014Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T08:57:06.3053580Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T08:57:06.3055662Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T08:57:06.3057303Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T08:57:06.3058961Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T08:57:06.3061773Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T08:57:06.3063379Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T08:57:06.3064989Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T08:57:06.3067345Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T08:57:06.3068982Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T08:57:06.3070573Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T08:57:06.3073349Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T08:57:06.3074800Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T08:57:06.3076356Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T08:57:06.3078432Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T08:57:06.3080058Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T08:57:06.3081688Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T08:57:06.3083836Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T08:57:06.3085434Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T08:57:06.3087012Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T08:57:06.3089183Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T08:57:06.3090807Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T08:57:06.3092762Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T08:57:06.3094778Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T08:57:06.3096353Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T08:57:06.3097929Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T08:57:06.3100005Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T08:57:06.3101659Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T08:57:06.3103279Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T08:57:06.3105398Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T08:57:06.3107420Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T08:57:06.3109007Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T08:57:06.3111198Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T08:57:06.3112791Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T08:57:06.3114382Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T08:57:06.3116515Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T08:57:06.3118396Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T08:57:06.3120539Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T08:57:06.3122061Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T08:57:06.3123640Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T08:57:06.3125922Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T08:57:06.3127520Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T08:57:06.3129223Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T08:57:06.3131338Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T08:57:06.3132995Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T08:57:06.3134606Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T08:57:06.3136868Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T08:57:06.3138384Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T08:57:06.3139848Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T08:57:06.3141986Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T08:57:06.3143526Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T08:57:06.3145134Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T08:57:06.3147845Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T08:57:06.3149508Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T08:57:06.3151205Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T08:57:06.3153907Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T08:57:06.3155757Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T08:57:06.3157546Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T08:57:06.3159742Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T08:57:06.3161485Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T08:57:06.3163101Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T08:57:06.3165628Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T08:57:06.3167211Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T08:57:06.3168866Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T08:57:06.3170910Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T08:57:06.3172522Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T08:57:06.3174209Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T08:57:06.3176417Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T08:57:06.3178079Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T08:57:06.3179680Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T08:57:06.3181813Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T08:57:06.3183366Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T08:57:06.3185022Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T08:57:06.3187092Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T08:57:06.3188677Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T08:57:06.3190259Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T08:57:06.3192410Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T08:57:06.3194046Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T08:57:06.3195596Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T08:57:06.3197779Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T08:57:06.3199401Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T08:57:06.3201196Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T08:57:06.3203724Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T08:57:06.3205255Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T08:57:06.3206806Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T08:57:06.3208988Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T08:57:06.3210985Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T08:57:06.3212566Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T08:57:06.3214897Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T08:57:06.3216531Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T08:57:06.3219324Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T08:57:06.3221345Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T08:57:06.3222866Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T08:57:06.3224455Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T08:57:06.3226592Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T08:57:06.3228602Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T08:57:06.3230413Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T08:57:06.3232512Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T08:57:06.3234084Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T08:57:06.3235663Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T08:57:06.3237770Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T08:57:06.3239340Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T08:57:06.3241373Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T08:57:06.3243582Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T08:57:06.3245167Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T08:57:06.3246752Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T08:57:06.3248846Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T08:57:06.3250320Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T08:57:06.3251812Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T08:57:06.3253771Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T08:57:06.3255313Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T08:57:06.3256915Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T08:57:06.3259151Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T08:57:06.3260676Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T08:57:06.3262297Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T08:57:06.3264387Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T08:57:06.3265942Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T08:57:06.3267724Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T08:57:06.3269596Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T08:57:06.3271135Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T08:57:06.3272712Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T08:57:06.3274669Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T08:57:06.3276285Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T08:57:06.3277892Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T08:57:06.3279845Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T08:57:06.3281588Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T08:57:06.3283283Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T08:57:06.3285727Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T08:57:06.3287315Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T08:57:06.3288907Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T08:57:06.3291536Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T08:57:06.3293621Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T08:57:06.3295203Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T08:57:06.3297358Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T08:57:06.3298960Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T08:57:06.3300508Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T08:57:06.3303118Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T08:57:06.3304765Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T08:57:06.3306894Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T08:57:06.3308522Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T08:57:06.3311554Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T08:57:06.3313475Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T08:57:06.3315125Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T08:57:06.3317879Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T08:57:06.3319558Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T08:57:06.3321543Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T08:57:06.3324231Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T08:57:06.3325946Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T08:57:06.3327867Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T08:57:06.3329693Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T08:57:06.3332288Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T08:57:06.3333863Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T08:57:06.3336108Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T08:57:06.3337725Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T08:57:06.3339277Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T08:57:06.3341362Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T08:57:06.3342933Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T08:57:06.3344555Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T08:57:06.3346629Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T08:57:06.3348201Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T08:57:06.3350347Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T08:57:06.3351895Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T08:57:06.3353464Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T08:57:06.3355848Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T08:57:06.3357563Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T08:57:06.3359271Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T08:57:06.3361478Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T08:57:06.3363063Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T08:57:06.3364748Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T08:57:06.3367400Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T08:57:06.3368958Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T08:57:06.3370898Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T08:57:06.3373011Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T08:57:06.3374602Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T08:57:06.3376218Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T08:57:06.3378360Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T08:57:06.3379954Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T08:57:06.3381556Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T08:57:06.3383661Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T08:57:06.3385241Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T08:57:06.3386797Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T08:57:06.3388967Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T08:57:06.3390827Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T08:57:06.3392979Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T08:57:06.3395077Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T08:57:06.3396670Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T08:57:06.3398310Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T08:57:06.3401102Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T08:57:06.3402728Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T08:57:06.3404258Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T08:57:06.3406611Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T08:57:06.3408678Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T08:57:06.3410341Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T08:57:06.3412695Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T08:57:06.3414303Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T08:57:06.3416257Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T08:57:06.3418615Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T08:57:06.3420244Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T08:57:06.3421860Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T08:57:06.3423813Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T08:57:06.3425450Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T08:57:06.3427020Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T08:57:06.3429613Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T08:57:06.3431224Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T08:57:06.3433411Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T08:57:06.3435055Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T08:57:06.3436678Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T08:57:06.3439152Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T08:57:06.3440871Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T08:57:06.3442476Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T08:57:06.3444722Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T08:57:06.3446273Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T08:57:06.3447793Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T08:57:06.3449938Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T08:57:06.3451506Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T08:57:06.3453129Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T08:57:06.3455432Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T08:57:06.3456984Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T08:57:06.3458595Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T08:57:06.3460693Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T08:57:06.3462281Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T08:57:06.3464826Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T08:57:06.3466496Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T08:57:06.3468097Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T08:57:06.3470555Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T08:57:06.3472453Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T08:57:06.3474080Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T08:57:06.3476345Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T08:57:06.3478353Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T08:57:06.3480092Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T08:57:06.3482158Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T08:57:06.3484185Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T08:57:06.3485769Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T08:57:06.3488141Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T08:57:06.3489754Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T08:57:06.3491580Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T08:57:06.3493810Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T08:57:06.3495488Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T08:57:06.3496933Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T08:57:06.3499160Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T08:57:06.3500704Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T08:57:06.3502329Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T08:57:06.3504895Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T08:57:06.3506770Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T08:57:06.3508403Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T08:57:06.3510461Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T08:57:06.3512059Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T08:57:06.3513954Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T08:57:06.3515507Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T08:57:06.3518181Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T08:57:06.3519829Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T08:57:06.3521873Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T08:57:06.3523356Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T08:57:06.3525765Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T08:57:06.3527304Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T08:57:06.3529367Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T08:57:06.3531250Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T08:57:06.3532990Z * [new branch] google-main -> origin/google-main 2025-12-04T08:57:06.3535071Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T08:57:06.3536551Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T08:57:06.3539185Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T08:57:06.3540988Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T08:57:06.3542583Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T08:57:06.3544048Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T08:57:06.3545622Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T08:57:06.3547177Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T08:57:06.3549276Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T08:57:06.3551566Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T08:57:06.3553122Z * [new branch] inlining -> origin/inlining 2025-12-04T08:57:06.3554634Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T08:57:06.3556333Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T08:57:06.3558373Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T08:57:06.3559735Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T08:57:06.3561615Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T08:57:06.3563337Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T08:57:06.3565478Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T08:57:06.3566992Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T08:57:06.3569287Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T08:57:06.3570785Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T08:57:06.3572951Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T08:57:06.3574557Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T08:57:06.3576229Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T08:57:06.3577927Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T08:57:06.3579659Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T08:57:06.3581681Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T08:57:06.3583416Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T08:57:06.3585060Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T08:57:06.3586724Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T08:57:06.3588512Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T08:57:06.3590092Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T08:57:06.3591721Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T08:57:06.3593917Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T08:57:06.3596110Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T08:57:06.3597810Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T08:57:06.3599406Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T08:57:06.3601743Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T08:57:06.3603813Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T08:57:06.3605877Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T08:57:06.3607468Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T08:57:06.3608971Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T08:57:06.3610480Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T08:57:06.3613136Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T08:57:06.3615280Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T08:57:06.3616846Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T08:57:06.3620060Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T08:57:06.3621541Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T08:57:06.3623340Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T08:57:06.3624955Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T08:57:06.3626943Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T08:57:06.3629011Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T08:57:06.3630508Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T08:57:06.3632136Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T08:57:06.3633893Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T08:57:06.3635517Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T08:57:06.3637131Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T08:57:06.3638735Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T08:57:06.3640386Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T08:57:06.3642072Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T08:57:06.3643709Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T08:57:06.3645879Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T08:57:06.3647562Z * [new branch] main -> origin/main 2025-12-04T08:57:06.3649473Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T08:57:06.3651471Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T08:57:06.3653314Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T08:57:06.3655015Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T08:57:06.3656769Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T08:57:06.3658614Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T08:57:06.3660140Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T08:57:06.3661706Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T08:57:06.3663888Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T08:57:06.3665613Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T08:57:06.3667238Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T08:57:06.3668933Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T08:57:06.3670727Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T08:57:06.3673347Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T08:57:06.3674860Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T08:57:06.3676929Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T08:57:06.3678603Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T08:57:06.3680379Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T08:57:06.3682109Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T08:57:06.3683829Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T08:57:06.3685481Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T08:57:06.3687751Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T08:57:06.3689257Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T08:57:06.3690750Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T08:57:06.3692193Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T08:57:06.3693694Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T08:57:06.3695318Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T08:57:06.3696734Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T08:57:06.3698221Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T08:57:06.3699561Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T08:57:06.3701365Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T08:57:06.3703191Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T08:57:06.3704721Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T08:57:06.3706347Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T08:57:06.3708024Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T08:57:06.3709820Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T08:57:06.3711440Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T08:57:06.3713092Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T08:57:06.3714676Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T08:57:06.3716389Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T08:57:06.3718093Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T08:57:06.3719703Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T08:57:06.3721472Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T08:57:06.3723070Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T08:57:06.3724638Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T08:57:06.3726350Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T08:57:06.3727870Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T08:57:06.3729538Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T08:57:06.3731100Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T08:57:06.3733145Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T08:57:06.3734833Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T08:57:06.3736500Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T08:57:06.3738084Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T08:57:06.3739872Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T08:57:06.3741422Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T08:57:06.3743120Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T08:57:06.3745309Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T08:57:06.3747006Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T08:57:06.3748620Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T08:57:06.3750233Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T08:57:06.3751829Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T08:57:06.3753412Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T08:57:06.3755048Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T08:57:06.3756663Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T08:57:06.3758343Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T08:57:06.3760004Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T08:57:06.3761851Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T08:57:06.3763370Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T08:57:06.3764944Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T08:57:06.3766525Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T08:57:06.3768173Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T08:57:06.3769801Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T08:57:06.3771365Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T08:57:06.3772986Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T08:57:06.3774583Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T08:57:06.3776101Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T08:57:06.3777890Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T08:57:06.3779467Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T08:57:06.3780963Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T08:57:06.3782413Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T08:57:06.3784011Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T08:57:06.3785655Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T08:57:06.3787903Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T08:57:06.3789512Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T08:57:06.3791086Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T08:57:06.3792847Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T08:57:06.3794453Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T08:57:06.3795976Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T08:57:06.3797624Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T08:57:06.3799232Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T08:57:06.3800991Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T08:57:06.3802605Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T08:57:06.3804213Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T08:57:06.3805814Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T08:57:06.3807431Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T08:57:06.3809109Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T08:57:06.3810737Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T08:57:06.3812274Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T08:57:06.3813983Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T08:57:06.3815579Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T08:57:06.3817342Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T08:57:06.3818979Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T08:57:06.3820549Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T08:57:06.3822185Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T08:57:06.3823871Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T08:57:06.3825536Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T08:57:06.3827139Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T08:57:06.3828738Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T08:57:06.3830797Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T08:57:06.3832321Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T08:57:06.3833997Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T08:57:06.3835850Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T08:57:06.3837218Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T08:57:06.3838950Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T08:57:06.3840518Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T08:57:06.3842668Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T08:57:06.3844328Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T08:57:06.3845875Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T08:57:06.3847511Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T08:57:06.3849202Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T08:57:06.3850937Z * [new branch] module-shim -> origin/module-shim 2025-12-04T08:57:06.3852507Z * [new branch] move_config -> origin/move_config 2025-12-04T08:57:06.3854640Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T08:57:06.3856744Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T08:57:06.3858987Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T08:57:06.3860529Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T08:57:06.3862274Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T08:57:06.3863956Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T08:57:06.3865645Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T08:57:06.3867804Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T08:57:06.3869306Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T08:57:06.3870844Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T08:57:06.3872268Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T08:57:06.3873908Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T08:57:06.3875327Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T08:57:06.3876827Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T08:57:06.3878334Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T08:57:06.3879942Z * [new branch] nightly -> origin/nightly 2025-12-04T08:57:06.3882301Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T08:57:06.3883828Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T08:57:06.3885490Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T08:57:06.3887272Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T08:57:06.3889132Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T08:57:06.3891390Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T08:57:06.3893031Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T08:57:06.3895180Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T08:57:06.3897024Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T08:57:06.3898775Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T08:57:06.3900167Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T08:57:06.3902281Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T08:57:06.3903962Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T08:57:06.3905652Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T08:57:06.3908287Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T08:57:06.3909994Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T08:57:06.3911635Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T08:57:06.3913555Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T08:57:06.3915168Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T08:57:06.3916864Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T08:57:06.3918828Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T08:57:06.3920481Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T08:57:06.3922097Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T08:57:06.3923701Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T08:57:06.3925304Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T08:57:06.3926945Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T08:57:06.3928444Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T08:57:06.3929975Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T08:57:06.3931574Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T08:57:06.3933592Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T08:57:06.3935781Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T08:57:06.3937404Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T08:57:06.3940874Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T08:57:06.3942391Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T08:57:06.3944987Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T08:57:06.3946790Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T08:57:06.3948513Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T08:57:06.3950230Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T08:57:06.3952386Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T08:57:06.3954064Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T08:57:06.3955811Z * [new branch] pca2 -> origin/pca2 2025-12-04T08:57:06.3957669Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T08:57:06.3959336Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T08:57:06.3961454Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T08:57:06.3963487Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T08:57:06.3965462Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T08:57:06.3966799Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T08:57:06.3968360Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T08:57:06.3969872Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T08:57:06.3971329Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T08:57:06.3973131Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T08:57:06.3975061Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T08:57:06.3977304Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T08:57:06.3978833Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T08:57:06.3980462Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T08:57:06.3982128Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T08:57:06.3983782Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T08:57:06.3985491Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T08:57:06.3987114Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T08:57:06.3988761Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T08:57:06.3990387Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T08:57:06.3991966Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T08:57:06.3993613Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T08:57:06.3995215Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T08:57:06.3996897Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T08:57:06.3998577Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T08:57:06.4000332Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T08:57:06.4001961Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T08:57:06.4004208Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T08:57:06.4005783Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T08:57:06.4007363Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T08:57:06.4009076Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T08:57:06.4010805Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T08:57:06.4012540Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T08:57:06.4014197Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T08:57:06.4015987Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T08:57:06.4017556Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T08:57:06.4020661Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T08:57:06.4022311Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T08:57:06.4023975Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T08:57:06.4025586Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T08:57:06.4027216Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T08:57:06.4028840Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T08:57:06.4030516Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T08:57:06.4032088Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T08:57:06.4033814Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T08:57:06.4035397Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T08:57:06.4037130Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T08:57:06.4039802Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T08:57:06.4040675Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T08:57:06.4042202Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T08:57:06.4043822Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T08:57:06.4045472Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T08:57:06.4048041Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T08:57:06.4049548Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T08:57:06.4051206Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T08:57:06.4052809Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T08:57:06.4054976Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T08:57:06.4056656Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T08:57:06.4058349Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T08:57:06.4059912Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T08:57:06.4062085Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T08:57:06.4064394Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T08:57:06.4066066Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T08:57:06.4068431Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T08:57:06.4070201Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T08:57:06.4072499Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T08:57:06.4074678Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T08:57:06.4076551Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T08:57:06.4077828Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T08:57:06.4079463Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T08:57:06.4080995Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T08:57:06.4082367Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T08:57:06.4083889Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T08:57:06.4085496Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T08:57:06.4087191Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T08:57:06.4088864Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T08:57:06.4090386Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T08:57:06.4091971Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T08:57:06.4093689Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T08:57:06.4095507Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T08:57:06.4097409Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T08:57:06.4099483Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T08:57:06.4101592Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T08:57:06.4103282Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T08:57:06.4105083Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T08:57:06.4106731Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T08:57:06.4108456Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T08:57:06.4110153Z * [new branch] release_notes -> origin/release_notes 2025-12-04T08:57:06.4111871Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T08:57:06.4113759Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T08:57:06.4115353Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T08:57:06.4116877Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T08:57:06.4118789Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T08:57:06.4122053Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T08:57:06.4124978Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T08:57:06.4128023Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T08:57:06.4131210Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T08:57:06.4133115Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T08:57:06.4134639Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T08:57:06.4136368Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T08:57:06.4137967Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T08:57:06.4140098Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T08:57:06.4141741Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T08:57:06.4143148Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T08:57:06.4144599Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T08:57:06.4146448Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T08:57:06.4148464Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T08:57:06.4150841Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T08:57:06.4152332Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T08:57:06.4154524Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T08:57:06.4155961Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T08:57:06.4157505Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T08:57:06.4159091Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T08:57:06.4160872Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T08:57:06.4163327Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T08:57:06.4164973Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T08:57:06.4166776Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T08:57:06.4168323Z * [new branch] save -> origin/save 2025-12-04T08:57:06.4169947Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T08:57:06.4171597Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T08:57:06.4174089Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T08:57:06.4175901Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T08:57:06.4177987Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T08:57:06.4179737Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T08:57:06.4181677Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T08:57:06.4183837Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T08:57:06.4185914Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T08:57:06.4187721Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T08:57:06.4189346Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T08:57:06.4190830Z * [new branch] suo -> origin/suo 2025-12-04T08:57:06.4192448Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T08:57:06.4194131Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T08:57:06.4195854Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T08:57:06.4197481Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T08:57:06.4199095Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T08:57:06.4201077Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T08:57:06.4202987Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T08:57:06.4204546Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T08:57:06.4206143Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T08:57:06.4207883Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T08:57:06.4209567Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T08:57:06.4211490Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T08:57:06.4213206Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T08:57:06.4214911Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T08:57:06.4216570Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T08:57:06.4218422Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T08:57:06.4219900Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T08:57:06.4221512Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T08:57:06.4223258Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T08:57:06.4224933Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T08:57:06.4226576Z * [new branch] test-old -> origin/test-old 2025-12-04T08:57:06.4228861Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T08:57:06.4231117Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T08:57:06.4232672Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T08:57:06.4234136Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T08:57:06.4236110Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T08:57:06.4237985Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T08:57:06.4239860Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T08:57:06.4241676Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T08:57:06.4243353Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T08:57:06.4244855Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T08:57:06.4246407Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T08:57:06.4247986Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T08:57:06.4249596Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T08:57:06.4251176Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T08:57:06.4252951Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T08:57:06.4254585Z * [new branch] tmp -> origin/tmp 2025-12-04T08:57:06.4256322Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T08:57:06.4258022Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T08:57:06.4259728Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T08:57:06.4261558Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T08:57:06.4263137Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T08:57:06.4264766Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T08:57:06.4266500Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T08:57:06.4268162Z * [new branch] type_dec -> origin/type_dec 2025-12-04T08:57:06.4269871Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T08:57:06.4272115Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T08:57:06.4273611Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T08:57:06.4275243Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T08:57:06.4276764Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T08:57:06.4278335Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T08:57:06.4280355Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T08:57:06.4282668Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T08:57:06.4284758Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T08:57:06.4286253Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T08:57:06.4287749Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T08:57:06.4289259Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T08:57:06.4290821Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T08:57:06.4292990Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T08:57:06.4294636Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T08:57:06.4296872Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T08:57:06.4298438Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T08:57:06.4300035Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T08:57:06.4301830Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T08:57:06.4303450Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T08:57:06.4305189Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T08:57:06.4306818Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T08:57:06.4308470Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T08:57:06.4310163Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T08:57:06.4311950Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T08:57:06.4313568Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T08:57:06.4315239Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T08:57:06.4316930Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T08:57:06.4318904Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T08:57:06.4320863Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T08:57:06.4322646Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T08:57:06.4324345Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T08:57:06.4326515Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T08:57:06.4328234Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T08:57:06.4329992Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T08:57:06.4331746Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T08:57:06.4333585Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T08:57:06.4335330Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T08:57:06.4337028Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T08:57:06.4338756Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T08:57:06.4340361Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T08:57:06.4342368Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T08:57:06.4344672Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T08:57:06.4346369Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T08:57:06.4348078Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T08:57:06.4350283Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T08:57:06.4352070Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T08:57:06.4354285Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T08:57:06.4356448Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T08:57:06.4357977Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T08:57:06.4359598Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T08:57:06.4361247Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T08:57:06.4362703Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T08:57:06.4364466Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T08:57:06.4366161Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T08:57:06.4367934Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T08:57:06.4369569Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T08:57:06.4371738Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T08:57:06.4373260Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T08:57:06.4374910Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T08:57:06.4376294Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T08:57:06.4377908Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T08:57:06.4379324Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T08:57:06.4380840Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T08:57:06.4382761Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T08:57:06.4384735Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T08:57:06.4386366Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T08:57:06.4388031Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T08:57:06.4389544Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T08:57:06.4391106Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T08:57:06.4392735Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T08:57:06.4394404Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T08:57:06.4395992Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T08:57:06.4397660Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T08:57:06.4399194Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T08:57:06.4400982Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T08:57:06.4403096Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T08:57:06.4404744Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T08:57:06.4406441Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T08:57:06.4408193Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:57:06.4409798Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:57:06.4411265Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T08:57:06.4412875Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T08:57:06.4414529Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T08:57:06.4416737Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T08:57:06.4419907Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T08:57:06.4421469Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T08:57:06.4423524Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T08:57:06.4425096Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T08:57:06.4426733Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T08:57:06.4428999Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T08:57:06.4430664Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T08:57:06.4432244Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T08:57:06.4433699Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T08:57:06.4435509Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T08:57:06.4437291Z * [new branch] zb2p -> origin/zb2p 2025-12-04T08:57:06.4438919Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T08:57:06.4441604Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T08:57:06.4443202Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T08:57:06.4444709Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T08:57:06.4446893Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T08:57:06.4449021Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T08:57:06.4450595Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T08:57:06.4452189Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T08:57:06.4453908Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T08:57:06.4455452Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T08:57:06.4457522Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T08:57:06.4459171Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T08:57:06.4460768Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T08:57:06.4462856Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T08:57:06.4464478Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T08:57:06.4466606Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T08:57:06.4468804Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T08:57:06.4470422Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T08:57:06.4472073Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T08:57:06.4473615Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T08:57:06.4475175Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T08:57:06.4476779Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T08:57:06.4478212Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-12-04T08:57:06.4479514Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-12-04T08:57:06.4481109Z * [new tag] ciflow/b200/115316 -> ciflow/b200/115316 2025-12-04T08:57:06.4482269Z * [new tag] ciflow/b200/160685 -> ciflow/b200/160685 2025-12-04T08:57:06.4483324Z * [new tag] ciflow/b200/161607 -> ciflow/b200/161607 2025-12-04T08:57:06.4484483Z * [new tag] ciflow/b200/161938 -> ciflow/b200/161938 2025-12-04T08:57:06.4485725Z * [new tag] ciflow/b200/167207 -> ciflow/b200/167207 2025-12-04T08:57:06.4486716Z * [new tag] ciflow/b200/167989 -> ciflow/b200/167989 2025-12-04T08:57:06.4487960Z * [new tag] ciflow/b200/168096 -> ciflow/b200/168096 2025-12-04T08:57:06.4489096Z * [new tag] ciflow/b200/168175 -> ciflow/b200/168175 2025-12-04T08:57:06.4490092Z * [new tag] ciflow/b200/168195 -> ciflow/b200/168195 2025-12-04T08:57:06.4491626Z * [new tag] ciflow/b200/169200 -> ciflow/b200/169200 2025-12-04T08:57:06.4492497Z * [new tag] ciflow/b200/169216 -> ciflow/b200/169216 2025-12-04T08:57:06.4494233Z * [new tag] ciflow/b200/169380 -> ciflow/b200/169380 2025-12-04T08:57:06.4495929Z * [new tag] ciflow/b200/169412 -> ciflow/b200/169412 2025-12-04T08:57:06.4497443Z * [new tag] ciflow/b200/169470 -> ciflow/b200/169470 2025-12-04T08:57:06.4498456Z * [new tag] ciflow/b200/169471 -> ciflow/b200/169471 2025-12-04T08:57:06.4499723Z * [new tag] ciflow/b200/169472 -> ciflow/b200/169472 2025-12-04T08:57:06.4501169Z * [new tag] ciflow/b200/169514 -> ciflow/b200/169514 2025-12-04T08:57:06.4502177Z * [new tag] ciflow/b200/169517 -> ciflow/b200/169517 2025-12-04T08:57:06.4503691Z * [new tag] ciflow/binaries/165922 -> ciflow/binaries/165922 2025-12-04T08:57:06.4504685Z * [new tag] ciflow/binaries/169510 -> ciflow/binaries/169510 2025-12-04T08:57:06.4506270Z * [new tag] ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994 2025-12-04T08:57:06.4507274Z * [new tag] ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829 2025-12-04T08:57:06.4508319Z * [new tag] ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972 2025-12-04T08:57:06.4509625Z * [new tag] ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981 2025-12-04T08:57:06.4510824Z * [new tag] ciflow/dynamo/167695 -> ciflow/dynamo/167695 2025-12-04T08:57:06.4511879Z * [new tag] ciflow/dynamo/168096 -> ciflow/dynamo/168096 2025-12-04T08:57:06.4513292Z * [new tag] ciflow/dynamo/169525 -> ciflow/dynamo/169525 2025-12-04T08:57:06.4514651Z * [new tag] ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938 2025-12-04T08:57:06.4515581Z * [new tag] ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940 2025-12-04T08:57:06.4517168Z * [new tag] ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923 2025-12-04T08:57:06.4518434Z * [new tag] ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552 2025-12-04T08:57:06.4519397Z * [new tag] ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129 2025-12-04T08:57:06.4520498Z * [new tag] ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917 2025-12-04T08:57:06.4521998Z * [new tag] ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156 2025-12-04T08:57:06.4522931Z * [new tag] ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200 2025-12-04T08:57:06.4523960Z * [new tag] ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216 2025-12-04T08:57:06.4525155Z * [new tag] ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338 2025-12-04T08:57:06.4526741Z * [new tag] ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355 2025-12-04T08:57:06.4527706Z * [new tag] ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543 2025-12-04T08:57:06.4529049Z * [new tag] ciflow/h100/115316 -> ciflow/h100/115316 2025-12-04T08:57:06.4530040Z * [new tag] ciflow/h100/160685 -> ciflow/h100/160685 2025-12-04T08:57:06.4531270Z * [new tag] ciflow/h100/160729 -> ciflow/h100/160729 2025-12-04T08:57:06.4532189Z * [new tag] ciflow/h100/161607 -> ciflow/h100/161607 2025-12-04T08:57:06.4533359Z * [new tag] ciflow/h100/161938 -> ciflow/h100/161938 2025-12-04T08:57:06.4534307Z * [new tag] ciflow/h100/167207 -> ciflow/h100/167207 2025-12-04T08:57:06.4535641Z * [new tag] ciflow/h100/167989 -> ciflow/h100/167989 2025-12-04T08:57:06.4536421Z * [new tag] ciflow/h100/168096 -> ciflow/h100/168096 2025-12-04T08:57:06.4537417Z * [new tag] ciflow/h100/168175 -> ciflow/h100/168175 2025-12-04T08:57:06.4538646Z * [new tag] ciflow/h100/168195 -> ciflow/h100/168195 2025-12-04T08:57:06.4539510Z * [new tag] ciflow/h100/168980 -> ciflow/h100/168980 2025-12-04T08:57:06.4540944Z * [new tag] ciflow/h100/169200 -> ciflow/h100/169200 2025-12-04T08:57:06.4542360Z * [new tag] ciflow/h100/169216 -> ciflow/h100/169216 2025-12-04T08:57:06.4543821Z * [new tag] ciflow/h100/169380 -> ciflow/h100/169380 2025-12-04T08:57:06.4544752Z * [new tag] ciflow/h100/169412 -> ciflow/h100/169412 2025-12-04T08:57:06.4545987Z * [new tag] ciflow/h100/169470 -> ciflow/h100/169470 2025-12-04T08:57:06.4547081Z * [new tag] ciflow/h100/169471 -> ciflow/h100/169471 2025-12-04T08:57:06.4548228Z * [new tag] ciflow/h100/169472 -> ciflow/h100/169472 2025-12-04T08:57:06.4549350Z * [new tag] ciflow/h100/169514 -> ciflow/h100/169514 2025-12-04T08:57:06.4550684Z * [new tag] ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096 2025-12-04T08:57:06.4552240Z * [new tag] ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096 2025-12-04T08:57:06.4553513Z * [new tag] ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165 2025-12-04T08:57:06.4554399Z * [new tag] ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096 2025-12-04T08:57:06.4556311Z * [new tag] ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096 2025-12-04T08:57:06.4558017Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073 2025-12-04T08:57:06.4559026Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096 2025-12-04T08:57:06.4560354Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024 2025-12-04T08:57:06.4561666Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024 2025-12-04T08:57:06.4562958Z * [new tag] ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096 2025-12-04T08:57:06.4564216Z * [new tag] ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096 2025-12-04T08:57:06.4565255Z * [new tag] ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024 2025-12-04T08:57:06.4566461Z * [new tag] ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425 2025-12-04T08:57:06.4567814Z * [new tag] ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545 2025-12-04T08:57:06.4569025Z * [new tag] ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997 2025-12-04T08:57:06.4570040Z * [new tag] ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096 2025-12-04T08:57:06.4571251Z * [new tag] ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063 2025-12-04T08:57:06.4572296Z * [new tag] ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425 2025-12-04T08:57:06.4573552Z * [new tag] ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545 2025-12-04T08:57:06.4574685Z * [new tag] ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096 2025-12-04T08:57:06.4575528Z * [new tag] ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063 2025-12-04T08:57:06.4576784Z * [new tag] ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425 2025-12-04T08:57:06.4578048Z * [new tag] ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052 2025-12-04T08:57:06.4579079Z * [new tag] ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971 2025-12-04T08:57:06.4580413Z * [new tag] ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096 2025-12-04T08:57:06.4581642Z * [new tag] ciflow/inductor/144542 -> ciflow/inductor/144542 2025-12-04T08:57:06.4582748Z * [new tag] ciflow/inductor/146506 -> ciflow/inductor/146506 2025-12-04T08:57:06.4583879Z * [new tag] ciflow/inductor/147990 -> ciflow/inductor/147990 2025-12-04T08:57:06.4585059Z * [new tag] ciflow/inductor/148294 -> ciflow/inductor/148294 2025-12-04T08:57:06.4586070Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-12-04T08:57:06.4587152Z * [new tag] ciflow/inductor/157149 -> ciflow/inductor/157149 2025-12-04T08:57:06.4588240Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-12-04T08:57:06.4589229Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-12-04T08:57:06.4590310Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-12-04T08:57:06.4591844Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-12-04T08:57:06.4592964Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-12-04T08:57:06.4594270Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-12-04T08:57:06.4595726Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-12-04T08:57:06.4597114Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-12-04T08:57:06.4598284Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-12-04T08:57:06.4599460Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-12-04T08:57:06.4600724Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-12-04T08:57:06.4601831Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-12-04T08:57:06.4602944Z * [new tag] ciflow/inductor/162795 -> ciflow/inductor/162795 2025-12-04T08:57:06.4604219Z * [new tag] ciflow/inductor/163245 -> ciflow/inductor/163245 2025-12-04T08:57:06.4605410Z * [new tag] ciflow/inductor/163335 -> ciflow/inductor/163335 2025-12-04T08:57:06.4606494Z * [new tag] ciflow/inductor/163503 -> ciflow/inductor/163503 2025-12-04T08:57:06.4607627Z * [new tag] ciflow/inductor/163942 -> ciflow/inductor/163942 2025-12-04T08:57:06.4608849Z * [new tag] ciflow/inductor/165270 -> ciflow/inductor/165270 2025-12-04T08:57:06.4610038Z * [new tag] ciflow/inductor/165274 -> ciflow/inductor/165274 2025-12-04T08:57:06.4611120Z * [new tag] ciflow/inductor/165322 -> ciflow/inductor/165322 2025-12-04T08:57:06.4612247Z * [new tag] ciflow/inductor/165597 -> ciflow/inductor/165597 2025-12-04T08:57:06.4613390Z * [new tag] ciflow/inductor/166063 -> ciflow/inductor/166063 2025-12-04T08:57:06.4614585Z * [new tag] ciflow/inductor/166075 -> ciflow/inductor/166075 2025-12-04T08:57:06.4615709Z * [new tag] ciflow/inductor/166165 -> ciflow/inductor/166165 2025-12-04T08:57:06.4617247Z * [new tag] ciflow/inductor/166254 -> ciflow/inductor/166254 2025-12-04T08:57:06.4618436Z * [new tag] ciflow/inductor/166483 -> ciflow/inductor/166483 2025-12-04T08:57:06.4619427Z * [new tag] ciflow/inductor/166494 -> ciflow/inductor/166494 2025-12-04T08:57:06.4620623Z * [new tag] ciflow/inductor/166545 -> ciflow/inductor/166545 2025-12-04T08:57:06.4621750Z * [new tag] ciflow/inductor/166788 -> ciflow/inductor/166788 2025-12-04T08:57:06.4622971Z * [new tag] ciflow/inductor/166846 -> ciflow/inductor/166846 2025-12-04T08:57:06.4624135Z * [new tag] ciflow/inductor/167300 -> ciflow/inductor/167300 2025-12-04T08:57:06.4625234Z * [new tag] ciflow/inductor/167407 -> ciflow/inductor/167407 2025-12-04T08:57:06.4626443Z * [new tag] ciflow/inductor/167536 -> ciflow/inductor/167536 2025-12-04T08:57:06.4627610Z * [new tag] ciflow/inductor/167552 -> ciflow/inductor/167552 2025-12-04T08:57:06.4628747Z * [new tag] ciflow/inductor/167555 -> ciflow/inductor/167555 2025-12-04T08:57:06.4630146Z * [new tag] ciflow/inductor/167583 -> ciflow/inductor/167583 2025-12-04T08:57:06.4631375Z * [new tag] ciflow/inductor/167599 -> ciflow/inductor/167599 2025-12-04T08:57:06.4632555Z * [new tag] ciflow/inductor/167647 -> ciflow/inductor/167647 2025-12-04T08:57:06.4633718Z * [new tag] ciflow/inductor/167677 -> ciflow/inductor/167677 2025-12-04T08:57:06.4634815Z * [new tag] ciflow/inductor/167680 -> ciflow/inductor/167680 2025-12-04T08:57:06.4635950Z * [new tag] ciflow/inductor/167695 -> ciflow/inductor/167695 2025-12-04T08:57:06.4637087Z * [new tag] ciflow/inductor/167742 -> ciflow/inductor/167742 2025-12-04T08:57:06.4638229Z * [new tag] ciflow/inductor/167768 -> ciflow/inductor/167768 2025-12-04T08:57:06.4639568Z * [new tag] ciflow/inductor/167773 -> ciflow/inductor/167773 2025-12-04T08:57:06.4640833Z * [new tag] ciflow/inductor/167781 -> ciflow/inductor/167781 2025-12-04T08:57:06.4642039Z * [new tag] ciflow/inductor/167880 -> ciflow/inductor/167880 2025-12-04T08:57:06.4643220Z * [new tag] ciflow/inductor/167887 -> ciflow/inductor/167887 2025-12-04T08:57:06.4644319Z * [new tag] ciflow/inductor/167972 -> ciflow/inductor/167972 2025-12-04T08:57:06.4645417Z * [new tag] ciflow/inductor/167989 -> ciflow/inductor/167989 2025-12-04T08:57:06.4646573Z * [new tag] ciflow/inductor/168002 -> ciflow/inductor/168002 2025-12-04T08:57:06.4647699Z * [new tag] ciflow/inductor/168050 -> ciflow/inductor/168050 2025-12-04T08:57:06.4648827Z * [new tag] ciflow/inductor/168051 -> ciflow/inductor/168051 2025-12-04T08:57:06.4649986Z * [new tag] ciflow/inductor/168052 -> ciflow/inductor/168052 2025-12-04T08:57:06.4651124Z * [new tag] ciflow/inductor/168073 -> ciflow/inductor/168073 2025-12-04T08:57:06.4652293Z * [new tag] ciflow/inductor/168096 -> ciflow/inductor/168096 2025-12-04T08:57:06.4653420Z * [new tag] ciflow/inductor/168114 -> ciflow/inductor/168114 2025-12-04T08:57:06.4654620Z * [new tag] ciflow/inductor/168115 -> ciflow/inductor/168115 2025-12-04T08:57:06.4655741Z * [new tag] ciflow/inductor/168127 -> ciflow/inductor/168127 2025-12-04T08:57:06.4656860Z * [new tag] ciflow/inductor/168129 -> ciflow/inductor/168129 2025-12-04T08:57:06.4657980Z * [new tag] ciflow/inductor/168157 -> ciflow/inductor/168157 2025-12-04T08:57:06.4659093Z * [new tag] ciflow/inductor/168175 -> ciflow/inductor/168175 2025-12-04T08:57:06.4660320Z * [new tag] ciflow/inductor/168185 -> ciflow/inductor/168185 2025-12-04T08:57:06.4661235Z * [new tag] ciflow/inductor/168195 -> ciflow/inductor/168195 2025-12-04T08:57:06.4662458Z * [new tag] ciflow/inductor/168209 -> ciflow/inductor/168209 2025-12-04T08:57:06.4663606Z * [new tag] ciflow/inductor/168266 -> ciflow/inductor/168266 2025-12-04T08:57:06.4664726Z * [new tag] ciflow/inductor/168316 -> ciflow/inductor/168316 2025-12-04T08:57:06.4665997Z * [new tag] ciflow/inductor/168326 -> ciflow/inductor/168326 2025-12-04T08:57:06.4667619Z * [new tag] ciflow/inductor/168368 -> ciflow/inductor/168368 2025-12-04T08:57:06.4668801Z * [new tag] ciflow/inductor/168894 -> ciflow/inductor/168894 2025-12-04T08:57:06.4670002Z * [new tag] ciflow/inductor/168934 -> ciflow/inductor/168934 2025-12-04T08:57:06.4671113Z * [new tag] ciflow/inductor/168939 -> ciflow/inductor/168939 2025-12-04T08:57:06.4672236Z * [new tag] ciflow/inductor/168946 -> ciflow/inductor/168946 2025-12-04T08:57:06.4673372Z * [new tag] ciflow/inductor/168950 -> ciflow/inductor/168950 2025-12-04T08:57:06.4674551Z * [new tag] ciflow/inductor/168951 -> ciflow/inductor/168951 2025-12-04T08:57:06.4675624Z * [new tag] ciflow/inductor/168952 -> ciflow/inductor/168952 2025-12-04T08:57:06.4676775Z * [new tag] ciflow/inductor/168955 -> ciflow/inductor/168955 2025-12-04T08:57:06.4677896Z * [new tag] ciflow/inductor/168971 -> ciflow/inductor/168971 2025-12-04T08:57:06.4679079Z * [new tag] ciflow/inductor/168979 -> ciflow/inductor/168979 2025-12-04T08:57:06.4680220Z * [new tag] ciflow/inductor/168980 -> ciflow/inductor/168980 2025-12-04T08:57:06.4681490Z * [new tag] ciflow/inductor/168983 -> ciflow/inductor/168983 2025-12-04T08:57:06.4682676Z * [new tag] ciflow/inductor/169006 -> ciflow/inductor/169006 2025-12-04T08:57:06.4683858Z * [new tag] ciflow/inductor/169023 -> ciflow/inductor/169023 2025-12-04T08:57:06.4684948Z * [new tag] ciflow/inductor/169024 -> ciflow/inductor/169024 2025-12-04T08:57:06.4686108Z * [new tag] ciflow/inductor/169025 -> ciflow/inductor/169025 2025-12-04T08:57:06.4687247Z * [new tag] ciflow/inductor/169066 -> ciflow/inductor/169066 2025-12-04T08:57:06.4688437Z * [new tag] ciflow/inductor/169091 -> ciflow/inductor/169091 2025-12-04T08:57:06.4689572Z * [new tag] ciflow/inductor/169102 -> ciflow/inductor/169102 2025-12-04T08:57:06.4690709Z * [new tag] ciflow/inductor/169103 -> ciflow/inductor/169103 2025-12-04T08:57:06.4691843Z * [new tag] ciflow/inductor/169121 -> ciflow/inductor/169121 2025-12-04T08:57:06.4693000Z * [new tag] ciflow/inductor/169134 -> ciflow/inductor/169134 2025-12-04T08:57:06.4694087Z * [new tag] ciflow/inductor/169135 -> ciflow/inductor/169135 2025-12-04T08:57:06.4695215Z * [new tag] ciflow/inductor/169141 -> ciflow/inductor/169141 2025-12-04T08:57:06.4696340Z * [new tag] ciflow/inductor/169151 -> ciflow/inductor/169151 2025-12-04T08:57:06.4697685Z * [new tag] ciflow/inductor/169161 -> ciflow/inductor/169161 2025-12-04T08:57:06.4698889Z * [new tag] ciflow/inductor/169167 -> ciflow/inductor/169167 2025-12-04T08:57:06.4700155Z * [new tag] ciflow/inductor/169177 -> ciflow/inductor/169177 2025-12-04T08:57:06.4701409Z * [new tag] ciflow/inductor/169185 -> ciflow/inductor/169185 2025-12-04T08:57:06.4702606Z * [new tag] ciflow/inductor/169196 -> ciflow/inductor/169196 2025-12-04T08:57:06.4703936Z * [new tag] ciflow/inductor/169200 -> ciflow/inductor/169200 2025-12-04T08:57:06.4704818Z * [new tag] ciflow/inductor/169204 -> ciflow/inductor/169204 2025-12-04T08:57:06.4706072Z * [new tag] ciflow/inductor/169216 -> ciflow/inductor/169216 2025-12-04T08:57:06.4707215Z * [new tag] ciflow/inductor/169219 -> ciflow/inductor/169219 2025-12-04T08:57:06.4708323Z * [new tag] ciflow/inductor/169220 -> ciflow/inductor/169220 2025-12-04T08:57:06.4709533Z * [new tag] ciflow/inductor/169230 -> ciflow/inductor/169230 2025-12-04T08:57:06.4710706Z * [new tag] ciflow/inductor/169242 -> ciflow/inductor/169242 2025-12-04T08:57:06.4711850Z * [new tag] ciflow/inductor/169245 -> ciflow/inductor/169245 2025-12-04T08:57:06.4713061Z * [new tag] ciflow/inductor/169260 -> ciflow/inductor/169260 2025-12-04T08:57:06.4714239Z * [new tag] ciflow/inductor/169282 -> ciflow/inductor/169282 2025-12-04T08:57:06.4715396Z * [new tag] ciflow/inductor/169286 -> ciflow/inductor/169286 2025-12-04T08:57:06.4716542Z * [new tag] ciflow/inductor/169299 -> ciflow/inductor/169299 2025-12-04T08:57:06.4717983Z * [new tag] ciflow/inductor/169304 -> ciflow/inductor/169304 2025-12-04T08:57:06.4719470Z * [new tag] ciflow/inductor/169305 -> ciflow/inductor/169305 2025-12-04T08:57:06.4720766Z * [new tag] ciflow/inductor/169308 -> ciflow/inductor/169308 2025-12-04T08:57:06.4721879Z * [new tag] ciflow/inductor/169319 -> ciflow/inductor/169319 2025-12-04T08:57:06.4722991Z * [new tag] ciflow/inductor/169326 -> ciflow/inductor/169326 2025-12-04T08:57:06.4724148Z * [new tag] ciflow/inductor/169332 -> ciflow/inductor/169332 2025-12-04T08:57:06.4725348Z * [new tag] ciflow/inductor/169333 -> ciflow/inductor/169333 2025-12-04T08:57:06.4726619Z * [new tag] ciflow/inductor/169336 -> ciflow/inductor/169336 2025-12-04T08:57:06.4727831Z * [new tag] ciflow/inductor/169340 -> ciflow/inductor/169340 2025-12-04T08:57:06.4728978Z * [new tag] ciflow/inductor/169341 -> ciflow/inductor/169341 2025-12-04T08:57:06.4730175Z * [new tag] ciflow/inductor/169343 -> ciflow/inductor/169343 2025-12-04T08:57:06.4731365Z * [new tag] ciflow/inductor/169346 -> ciflow/inductor/169346 2025-12-04T08:57:06.4732579Z * [new tag] ciflow/inductor/169348 -> ciflow/inductor/169348 2025-12-04T08:57:06.4733908Z * [new tag] ciflow/inductor/169350 -> ciflow/inductor/169350 2025-12-04T08:57:06.4735095Z * [new tag] ciflow/inductor/169355 -> ciflow/inductor/169355 2025-12-04T08:57:06.4736254Z * [new tag] ciflow/inductor/169370 -> ciflow/inductor/169370 2025-12-04T08:57:06.4737566Z * [new tag] ciflow/inductor/169375 -> ciflow/inductor/169375 2025-12-04T08:57:06.4738767Z * [new tag] ciflow/inductor/169389 -> ciflow/inductor/169389 2025-12-04T08:57:06.4739950Z * [new tag] ciflow/inductor/169391 -> ciflow/inductor/169391 2025-12-04T08:57:06.4741083Z * [new tag] ciflow/inductor/169393 -> ciflow/inductor/169393 2025-12-04T08:57:06.4742262Z * [new tag] ciflow/inductor/169399 -> ciflow/inductor/169399 2025-12-04T08:57:06.4743954Z * [new tag] ciflow/inductor/169400 -> ciflow/inductor/169400 2025-12-04T08:57:06.4745168Z * [new tag] ciflow/inductor/169415 -> ciflow/inductor/169415 2025-12-04T08:57:06.4746312Z * [new tag] ciflow/inductor/169417 -> ciflow/inductor/169417 2025-12-04T08:57:06.4747596Z * [new tag] ciflow/inductor/169418 -> ciflow/inductor/169418 2025-12-04T08:57:06.4748840Z * [new tag] ciflow/inductor/169430 -> ciflow/inductor/169430 2025-12-04T08:57:06.4749996Z * [new tag] ciflow/inductor/169432 -> ciflow/inductor/169432 2025-12-04T08:57:06.4751092Z * [new tag] ciflow/inductor/169436 -> ciflow/inductor/169436 2025-12-04T08:57:06.4752300Z * [new tag] ciflow/inductor/169437 -> ciflow/inductor/169437 2025-12-04T08:57:06.4753503Z * [new tag] ciflow/inductor/169438 -> ciflow/inductor/169438 2025-12-04T08:57:06.4754676Z * [new tag] ciflow/inductor/169441 -> ciflow/inductor/169441 2025-12-04T08:57:06.4755864Z * [new tag] ciflow/inductor/169446 -> ciflow/inductor/169446 2025-12-04T08:57:06.4757275Z * [new tag] ciflow/inductor/169447 -> ciflow/inductor/169447 2025-12-04T08:57:06.4758391Z * [new tag] ciflow/inductor/169452 -> ciflow/inductor/169452 2025-12-04T08:57:06.4771673Z * [new tag] ciflow/inductor/169455 -> ciflow/inductor/169455 2025-12-04T08:57:06.4772163Z * [new tag] ciflow/inductor/169459 -> ciflow/inductor/169459 2025-12-04T08:57:06.4772520Z * [new tag] ciflow/inductor/169463 -> ciflow/inductor/169463 2025-12-04T08:57:06.4772894Z * [new tag] ciflow/inductor/169476 -> ciflow/inductor/169476 2025-12-04T08:57:06.4773293Z * [new tag] ciflow/inductor/169485 -> ciflow/inductor/169485 2025-12-04T08:57:06.4773754Z * [new tag] ciflow/inductor/169493 -> ciflow/inductor/169493 2025-12-04T08:57:06.4774124Z * [new tag] ciflow/inductor/169496 -> ciflow/inductor/169496 2025-12-04T08:57:06.4774449Z * [new tag] ciflow/inductor/169497 -> ciflow/inductor/169497 2025-12-04T08:57:06.4774783Z * [new tag] ciflow/inductor/169503 -> ciflow/inductor/169503 2025-12-04T08:57:06.4775109Z * [new tag] ciflow/inductor/169504 -> ciflow/inductor/169504 2025-12-04T08:57:06.4775434Z * [new tag] ciflow/inductor/169505 -> ciflow/inductor/169505 2025-12-04T08:57:06.4775760Z * [new tag] ciflow/inductor/169508 -> ciflow/inductor/169508 2025-12-04T08:57:06.4776083Z * [new tag] ciflow/inductor/169509 -> ciflow/inductor/169509 2025-12-04T08:57:06.4776404Z * [new tag] ciflow/inductor/169513 -> ciflow/inductor/169513 2025-12-04T08:57:06.4777019Z * [new tag] ciflow/inductor/169514 -> ciflow/inductor/169514 2025-12-04T08:57:06.4777926Z * [new tag] ciflow/inductor/169515 -> ciflow/inductor/169515 2025-12-04T08:57:06.4778961Z * [new tag] ciflow/inductor/169517 -> ciflow/inductor/169517 2025-12-04T08:57:06.4780053Z * [new tag] ciflow/inductor/169519 -> ciflow/inductor/169519 2025-12-04T08:57:06.4781176Z * [new tag] ciflow/inductor/169520 -> ciflow/inductor/169520 2025-12-04T08:57:06.4782389Z * [new tag] ciflow/inductor/169521 -> ciflow/inductor/169521 2025-12-04T08:57:06.4783530Z * [new tag] ciflow/inductor/169524 -> ciflow/inductor/169524 2025-12-04T08:57:06.4784666Z * [new tag] ciflow/inductor/169527 -> ciflow/inductor/169527 2025-12-04T08:57:06.4785826Z * [new tag] ciflow/inductor/169528 -> ciflow/inductor/169528 2025-12-04T08:57:06.4787166Z * [new tag] ciflow/inductor/169532 -> ciflow/inductor/169532 2025-12-04T08:57:06.4788235Z * [new tag] ciflow/inductor/169535 -> ciflow/inductor/169535 2025-12-04T08:57:06.4789342Z * [new tag] ciflow/inductor/169536 -> ciflow/inductor/169536 2025-12-04T08:57:06.4790497Z * [new tag] ciflow/inductor/169547 -> ciflow/inductor/169547 2025-12-04T08:57:06.4791878Z * [new tag] ciflow/inductor/169548 -> ciflow/inductor/169548 2025-12-04T08:57:06.4792898Z * [new tag] ciflow/inductor/169549 -> ciflow/inductor/169549 2025-12-04T08:57:06.4794021Z * [new tag] ciflow/inductor/169551 -> ciflow/inductor/169551 2025-12-04T08:57:06.4795186Z * [new tag] ciflow/inductor/169552 -> ciflow/inductor/169552 2025-12-04T08:57:06.4796633Z * [new tag] ciflow/inductor/169553 -> ciflow/inductor/169553 2025-12-04T08:57:06.4797753Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-12-04T08:57:06.4799193Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-12-04T08:57:06.4800447Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-12-04T08:57:06.4801946Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-12-04T08:57:06.4802895Z * [new tag] ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075 2025-12-04T08:57:06.4803932Z * [new tag] ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876 2025-12-04T08:57:06.4805001Z * [new tag] ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981 2025-12-04T08:57:06.4806503Z * [new tag] ciflow/mps/166254 -> ciflow/mps/166254 2025-12-04T08:57:06.4807296Z * [new tag] ciflow/mps/169017 -> ciflow/mps/169017 2025-12-04T08:57:06.4808591Z * [new tag] ciflow/mps/169372 -> ciflow/mps/169372 2025-12-04T08:57:06.4809596Z * [new tag] ciflow/mps/169478 -> ciflow/mps/169478 2025-12-04T08:57:06.4811082Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-12-04T08:57:06.4812016Z * [new tag] ciflow/op-benchmark/166075 -> ciflow/op-benchmark/166075 2025-12-04T08:57:06.4813041Z * [new tag] ciflow/op-benchmark/169544 -> ciflow/op-benchmark/169544 2025-12-04T08:57:06.4814528Z * [new tag] ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997 2025-12-04T08:57:06.4815652Z * [new tag] ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517 2025-12-04T08:57:06.4816730Z * [new tag] ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063 2025-12-04T08:57:06.4819583Z * [new tag] ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425 2025-12-04T08:57:06.4820801Z * [new tag] ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517 2025-12-04T08:57:06.4821988Z * [new tag] ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063 2025-12-04T08:57:06.4823061Z * [new tag] ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425 2025-12-04T08:57:06.4824601Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-12-04T08:57:06.4825618Z * [new tag] ciflow/periodic/167207 -> ciflow/periodic/167207 2025-12-04T08:57:06.4826756Z * [new tag] ciflow/periodic/167978 -> ciflow/periodic/167978 2025-12-04T08:57:06.4827819Z * [new tag] ciflow/periodic/168096 -> ciflow/periodic/168096 2025-12-04T08:57:06.4828884Z * [new tag] ciflow/periodic/169286 -> ciflow/periodic/169286 2025-12-04T08:57:06.4830229Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-12-04T08:57:06.4831354Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-12-04T08:57:06.4832531Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-12-04T08:57:06.4833900Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-12-04T08:57:06.4835833Z * [new tag] ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T08:57:06.4836977Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-12-04T08:57:06.4838642Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-12-04T08:57:06.4839850Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-12-04T08:57:06.4841413Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-12-04T08:57:06.4842627Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-12-04T08:57:06.4844114Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-12-04T08:57:06.4845551Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-12-04T08:57:06.4846684Z * [new tag] ciflow/pull/167207 -> ciflow/pull/167207 2025-12-04T08:57:06.4848328Z * [new tag] ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207 2025-12-04T08:57:06.4849481Z * [new tag] ciflow/rocm-mi200/165545 -> ciflow/rocm-mi200/165545 2025-12-04T08:57:06.4850638Z * [new tag] ciflow/rocm-mi200/165997 -> ciflow/rocm-mi200/165997 2025-12-04T08:57:06.4851730Z * [new tag] ciflow/rocm-mi200/168096 -> ciflow/rocm-mi200/168096 2025-12-04T08:57:06.4852909Z * [new tag] ciflow/rocm-mi200/168275 -> ciflow/rocm-mi200/168275 2025-12-04T08:57:06.4853962Z * [new tag] ciflow/rocm-mi200/169063 -> ciflow/rocm-mi200/169063 2025-12-04T08:57:06.4855648Z * [new tag] ciflow/rocm-mi200/169356 -> ciflow/rocm-mi200/169356 2025-12-04T08:57:06.4856690Z * [new tag] ciflow/rocm-mi200/169425 -> ciflow/rocm-mi200/169425 2025-12-04T08:57:06.4858071Z * [new tag] ciflow/rocm-mi300/165545 -> ciflow/rocm-mi300/165545 2025-12-04T08:57:06.4859208Z * [new tag] ciflow/rocm-mi300/167157 -> ciflow/rocm-mi300/167157 2025-12-04T08:57:06.4860282Z * [new tag] ciflow/rocm-mi300/168096 -> ciflow/rocm-mi300/168096 2025-12-04T08:57:06.4861890Z * [new tag] ciflow/rocm-mi300/169063 -> ciflow/rocm-mi300/169063 2025-12-04T08:57:06.4862947Z * [new tag] ciflow/rocm-mi300/169425 -> ciflow/rocm-mi300/169425 2025-12-04T08:57:06.4864284Z * [new tag] ciflow/rocm-mi355/167157 -> ciflow/rocm-mi355/167157 2025-12-04T08:57:06.4865292Z * [new tag] ciflow/rocm-mi355/168275 -> ciflow/rocm-mi355/168275 2025-12-04T08:57:06.4866415Z * [new tag] ciflow/rocm-mi355/169425 -> ciflow/rocm-mi355/169425 2025-12-04T08:57:06.4867862Z * [new tag] ciflow/rocm-navi31/168275 -> ciflow/rocm-navi31/168275 2025-12-04T08:57:06.4868912Z * [new tag] ciflow/rocm-navi31/169425 -> ciflow/rocm-navi31/169425 2025-12-04T08:57:06.4870235Z * [new tag] ciflow/rocm/115316 -> ciflow/rocm/115316 2025-12-04T08:57:06.4871265Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-12-04T08:57:06.4872273Z * [new tag] ciflow/rocm/160685 -> ciflow/rocm/160685 2025-12-04T08:57:06.4873327Z * [new tag] ciflow/rocm/161607 -> ciflow/rocm/161607 2025-12-04T08:57:06.4874372Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-12-04T08:57:06.4875441Z * [new tag] ciflow/rocm/165997 -> ciflow/rocm/165997 2025-12-04T08:57:06.4876471Z * [new tag] ciflow/rocm/166165 -> ciflow/rocm/166165 2025-12-04T08:57:06.4877541Z * [new tag] ciflow/rocm/166517 -> ciflow/rocm/166517 2025-12-04T08:57:06.4878581Z * [new tag] ciflow/rocm/167207 -> ciflow/rocm/167207 2025-12-04T08:57:06.4879750Z * [new tag] ciflow/rocm/167536 -> ciflow/rocm/167536 2025-12-04T08:57:06.4880839Z * [new tag] ciflow/rocm/167781 -> ciflow/rocm/167781 2025-12-04T08:57:06.4882235Z * [new tag] ciflow/rocm/167989 -> ciflow/rocm/167989 2025-12-04T08:57:06.4883721Z * [new tag] ciflow/rocm/168073 -> ciflow/rocm/168073 2025-12-04T08:57:06.4885090Z * [new tag] ciflow/rocm/168195 -> ciflow/rocm/168195 2025-12-04T08:57:06.4886136Z * [new tag] ciflow/rocm/168939 -> ciflow/rocm/168939 2025-12-04T08:57:06.4887242Z * [new tag] ciflow/rocm/168971 -> ciflow/rocm/168971 2025-12-04T08:57:06.4888357Z * [new tag] ciflow/rocm/169024 -> ciflow/rocm/169024 2025-12-04T08:57:06.4889514Z * [new tag] ciflow/rocm/169200 -> ciflow/rocm/169200 2025-12-04T08:57:06.4890600Z * [new tag] ciflow/rocm/169216 -> ciflow/rocm/169216 2025-12-04T08:57:06.4891711Z * [new tag] ciflow/rocm/169312 -> ciflow/rocm/169312 2025-12-04T08:57:06.4892833Z * [new tag] ciflow/rocm/169380 -> ciflow/rocm/169380 2025-12-04T08:57:06.4894051Z * [new tag] ciflow/rocm/169427 -> ciflow/rocm/169427 2025-12-04T08:57:06.4895118Z * [new tag] ciflow/rocm/169455 -> ciflow/rocm/169455 2025-12-04T08:57:06.4896228Z * [new tag] ciflow/rocm/169470 -> ciflow/rocm/169470 2025-12-04T08:57:06.4897353Z * [new tag] ciflow/rocm/169471 -> ciflow/rocm/169471 2025-12-04T08:57:06.4898645Z * [new tag] ciflow/rocm/169472 -> ciflow/rocm/169472 2025-12-04T08:57:06.4899617Z * [new tag] ciflow/rocm/169514 -> ciflow/rocm/169514 2025-12-04T08:57:06.4901097Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-12-04T08:57:06.4902241Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-12-04T08:57:06.4903920Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-12-04T08:57:06.4904785Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-12-04T08:57:06.4905860Z * [new tag] ciflow/slow/167207 -> ciflow/slow/167207 2025-12-04T08:57:06.4906930Z * [new tag] ciflow/slow/168050 -> ciflow/slow/168050 2025-12-04T08:57:06.4908256Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-12-04T08:57:06.4909383Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-12-04T08:57:06.4911013Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-12-04T08:57:06.4912634Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-12-04T08:57:06.4914099Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-12-04T08:57:06.4915427Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-12-04T08:57:06.4916745Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-12-04T08:57:06.4918238Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-12-04T08:57:06.4919834Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-12-04T08:57:06.4920939Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-12-04T08:57:06.4922326Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-12-04T08:57:06.4923525Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-12-04T08:57:06.4925020Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-12-04T08:57:06.4926093Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-12-04T08:57:06.4927833Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-12-04T08:57:06.4928819Z * [new tag] ciflow/torchbench/168175 -> ciflow/torchbench/168175 2025-12-04T08:57:06.4930203Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-12-04T08:57:06.4931189Z * [new tag] ciflow/trunk/157149 -> ciflow/trunk/157149 2025-12-04T08:57:06.4932252Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-12-04T08:57:06.4933314Z * [new tag] ciflow/trunk/159718 -> ciflow/trunk/159718 2025-12-04T08:57:06.4934414Z * [new tag] ciflow/trunk/160685 -> ciflow/trunk/160685 2025-12-04T08:57:06.4935455Z * [new tag] ciflow/trunk/160729 -> ciflow/trunk/160729 2025-12-04T08:57:06.4936518Z * [new tag] ciflow/trunk/162275 -> ciflow/trunk/162275 2025-12-04T08:57:06.4937599Z * [new tag] ciflow/trunk/162795 -> ciflow/trunk/162795 2025-12-04T08:57:06.4938675Z * [new tag] ciflow/trunk/163245 -> ciflow/trunk/163245 2025-12-04T08:57:06.4939735Z * [new tag] ciflow/trunk/163942 -> ciflow/trunk/163942 2025-12-04T08:57:06.4940748Z * [new tag] ciflow/trunk/165274 -> ciflow/trunk/165274 2025-12-04T08:57:06.4942321Z * [new tag] ciflow/trunk/165483 -> ciflow/trunk/165483 2025-12-04T08:57:06.4943757Z * [new tag] ciflow/trunk/165728 -> ciflow/trunk/165728 2025-12-04T08:57:06.4945103Z * [new tag] ciflow/trunk/165922 -> ciflow/trunk/165922 2025-12-04T08:57:06.4946446Z * [new tag] ciflow/trunk/166075 -> ciflow/trunk/166075 2025-12-04T08:57:06.4947496Z * [new tag] ciflow/trunk/166165 -> ciflow/trunk/166165 2025-12-04T08:57:06.4948609Z * [new tag] ciflow/trunk/166829 -> ciflow/trunk/166829 2025-12-04T08:57:06.4949989Z * [new tag] ciflow/trunk/166843 -> ciflow/trunk/166843 2025-12-04T08:57:06.4951056Z * [new tag] ciflow/trunk/166876 -> ciflow/trunk/166876 2025-12-04T08:57:06.4952172Z * [new tag] ciflow/trunk/167207 -> ciflow/trunk/167207 2025-12-04T08:57:06.4953291Z * [new tag] ciflow/trunk/167536 -> ciflow/trunk/167536 2025-12-04T08:57:06.4954518Z * [new tag] ciflow/trunk/167552 -> ciflow/trunk/167552 2025-12-04T08:57:06.4955598Z * [new tag] ciflow/trunk/167555 -> ciflow/trunk/167555 2025-12-04T08:57:06.4956815Z * [new tag] ciflow/trunk/167599 -> ciflow/trunk/167599 2025-12-04T08:57:06.4957943Z * [new tag] ciflow/trunk/167659 -> ciflow/trunk/167659 2025-12-04T08:57:06.4959377Z * [new tag] ciflow/trunk/167672 -> ciflow/trunk/167672 2025-12-04T08:57:06.4960475Z * [new tag] ciflow/trunk/167742 -> ciflow/trunk/167742 2025-12-04T08:57:06.4961612Z * [new tag] ciflow/trunk/167781 -> ciflow/trunk/167781 2025-12-04T08:57:06.4962978Z * [new tag] ciflow/trunk/167837 -> ciflow/trunk/167837 2025-12-04T08:57:06.4964048Z * [new tag] ciflow/trunk/167887 -> ciflow/trunk/167887 2025-12-04T08:57:06.4965156Z * [new tag] ciflow/trunk/167978 -> ciflow/trunk/167978 2025-12-04T08:57:06.4966381Z * [new tag] ciflow/trunk/168050 -> ciflow/trunk/168050 2025-12-04T08:57:06.4967533Z * [new tag] ciflow/trunk/168051 -> ciflow/trunk/168051 2025-12-04T08:57:06.4968811Z * [new tag] ciflow/trunk/168096 -> ciflow/trunk/168096 2025-12-04T08:57:06.4969785Z * [new tag] ciflow/trunk/168127 -> ciflow/trunk/168127 2025-12-04T08:57:06.4970840Z * [new tag] ciflow/trunk/168157 -> ciflow/trunk/168157 2025-12-04T08:57:06.4972064Z * [new tag] ciflow/trunk/168175 -> ciflow/trunk/168175 2025-12-04T08:57:06.4973152Z * [new tag] ciflow/trunk/168209 -> ciflow/trunk/168209 2025-12-04T08:57:06.4974333Z * [new tag] ciflow/trunk/168213 -> ciflow/trunk/168213 2025-12-04T08:57:06.4975665Z * [new tag] ciflow/trunk/168226 -> ciflow/trunk/168226 2025-12-04T08:57:06.4976768Z * [new tag] ciflow/trunk/168262 -> ciflow/trunk/168262 2025-12-04T08:57:06.4977961Z * [new tag] ciflow/trunk/168275 -> ciflow/trunk/168275 2025-12-04T08:57:06.4979140Z * [new tag] ciflow/trunk/168328 -> ciflow/trunk/168328 2025-12-04T08:57:06.4980298Z * [new tag] ciflow/trunk/168368 -> ciflow/trunk/168368 2025-12-04T08:57:06.4981489Z * [new tag] ciflow/trunk/168917 -> ciflow/trunk/168917 2025-12-04T08:57:06.4982582Z * [new tag] ciflow/trunk/168933 -> ciflow/trunk/168933 2025-12-04T08:57:06.4983981Z * [new tag] ciflow/trunk/168941 -> ciflow/trunk/168941 2025-12-04T08:57:06.4985050Z * [new tag] ciflow/trunk/168955 -> ciflow/trunk/168955 2025-12-04T08:57:06.4986089Z * [new tag] ciflow/trunk/168980 -> ciflow/trunk/168980 2025-12-04T08:57:06.4987432Z * [new tag] ciflow/trunk/169004 -> ciflow/trunk/169004 2025-12-04T08:57:06.4988588Z * [new tag] ciflow/trunk/169006 -> ciflow/trunk/169006 2025-12-04T08:57:06.4989636Z * [new tag] ciflow/trunk/169023 -> ciflow/trunk/169023 2025-12-04T08:57:06.4990706Z * [new tag] ciflow/trunk/169025 -> ciflow/trunk/169025 2025-12-04T08:57:06.4991867Z * [new tag] ciflow/trunk/169048 -> ciflow/trunk/169048 2025-12-04T08:57:06.4993521Z * [new tag] ciflow/trunk/169066 -> ciflow/trunk/169066 2025-12-04T08:57:06.4994694Z * [new tag] ciflow/trunk/169091 -> ciflow/trunk/169091 2025-12-04T08:57:06.4995786Z * [new tag] ciflow/trunk/169102 -> ciflow/trunk/169102 2025-12-04T08:57:06.4996957Z * [new tag] ciflow/trunk/169103 -> ciflow/trunk/169103 2025-12-04T08:57:06.4998239Z * [new tag] ciflow/trunk/169125 -> ciflow/trunk/169125 2025-12-04T08:57:06.4999497Z * [new tag] ciflow/trunk/169139 -> ciflow/trunk/169139 2025-12-04T08:57:06.5000985Z * [new tag] ciflow/trunk/169148 -> ciflow/trunk/169148 2025-12-04T08:57:06.5002120Z * [new tag] ciflow/trunk/169151 -> ciflow/trunk/169151 2025-12-04T08:57:06.5003204Z * [new tag] ciflow/trunk/169156 -> ciflow/trunk/169156 2025-12-04T08:57:06.5004648Z * [new tag] ciflow/trunk/169176 -> ciflow/trunk/169176 2025-12-04T08:57:06.5005744Z * [new tag] ciflow/trunk/169204 -> ciflow/trunk/169204 2025-12-04T08:57:06.5006794Z * [new tag] ciflow/trunk/169207 -> ciflow/trunk/169207 2025-12-04T08:57:06.5007971Z * [new tag] ciflow/trunk/169211 -> ciflow/trunk/169211 2025-12-04T08:57:06.5009491Z * [new tag] ciflow/trunk/169229 -> ciflow/trunk/169229 2025-12-04T08:57:06.5010607Z * [new tag] ciflow/trunk/169231 -> ciflow/trunk/169231 2025-12-04T08:57:06.5011715Z * [new tag] ciflow/trunk/169260 -> ciflow/trunk/169260 2025-12-04T08:57:06.5013247Z * [new tag] ciflow/trunk/169271 -> ciflow/trunk/169271 2025-12-04T08:57:06.5014449Z * [new tag] ciflow/trunk/169280 -> ciflow/trunk/169280 2025-12-04T08:57:06.5015471Z * [new tag] ciflow/trunk/169281 -> ciflow/trunk/169281 2025-12-04T08:57:06.5016582Z * [new tag] ciflow/trunk/169286 -> ciflow/trunk/169286 2025-12-04T08:57:06.5017985Z * [new tag] ciflow/trunk/169293 -> ciflow/trunk/169293 2025-12-04T08:57:06.5019135Z * [new tag] ciflow/trunk/169296 -> ciflow/trunk/169296 2025-12-04T08:57:06.5020374Z * [new tag] ciflow/trunk/169304 -> ciflow/trunk/169304 2025-12-04T08:57:06.5021443Z * [new tag] ciflow/trunk/169305 -> ciflow/trunk/169305 2025-12-04T08:57:06.5022633Z * [new tag] ciflow/trunk/169312 -> ciflow/trunk/169312 2025-12-04T08:57:06.5024172Z * [new tag] ciflow/trunk/169328 -> ciflow/trunk/169328 2025-12-04T08:57:06.5025155Z * [new tag] ciflow/trunk/169343 -> ciflow/trunk/169343 2025-12-04T08:57:06.5026374Z * [new tag] ciflow/trunk/169355 -> ciflow/trunk/169355 2025-12-04T08:57:06.5027377Z * [new tag] ciflow/trunk/169370 -> ciflow/trunk/169370 2025-12-04T08:57:06.5028740Z * [new tag] ciflow/trunk/169379 -> ciflow/trunk/169379 2025-12-04T08:57:06.5029841Z * [new tag] ciflow/trunk/169380 -> ciflow/trunk/169380 2025-12-04T08:57:06.5030939Z * [new tag] ciflow/trunk/169385 -> ciflow/trunk/169385 2025-12-04T08:57:06.5032041Z * [new tag] ciflow/trunk/169387 -> ciflow/trunk/169387 2025-12-04T08:57:06.5033360Z * [new tag] ciflow/trunk/169410 -> ciflow/trunk/169410 2025-12-04T08:57:06.5034516Z * [new tag] ciflow/trunk/169412 -> ciflow/trunk/169412 2025-12-04T08:57:06.5035669Z * [new tag] ciflow/trunk/169418 -> ciflow/trunk/169418 2025-12-04T08:57:06.5036969Z * [new tag] ciflow/trunk/169423 -> ciflow/trunk/169423 2025-12-04T08:57:06.5037975Z * [new tag] ciflow/trunk/169427 -> ciflow/trunk/169427 2025-12-04T08:57:06.5039072Z * [new tag] ciflow/trunk/169430 -> ciflow/trunk/169430 2025-12-04T08:57:06.5040325Z * [new tag] ciflow/trunk/169437 -> ciflow/trunk/169437 2025-12-04T08:57:06.5041468Z * [new tag] ciflow/trunk/169442 -> ciflow/trunk/169442 2025-12-04T08:57:06.5042598Z * [new tag] ciflow/trunk/169452 -> ciflow/trunk/169452 2025-12-04T08:57:06.5043738Z * [new tag] ciflow/trunk/169454 -> ciflow/trunk/169454 2025-12-04T08:57:06.5044843Z * [new tag] ciflow/trunk/169459 -> ciflow/trunk/169459 2025-12-04T08:57:06.5046183Z * [new tag] ciflow/trunk/169474 -> ciflow/trunk/169474 2025-12-04T08:57:06.5047266Z * [new tag] ciflow/trunk/169475 -> ciflow/trunk/169475 2025-12-04T08:57:06.5048401Z * [new tag] ciflow/trunk/169476 -> ciflow/trunk/169476 2025-12-04T08:57:06.5049696Z * [new tag] ciflow/trunk/169487 -> ciflow/trunk/169487 2025-12-04T08:57:06.5050899Z * [new tag] ciflow/trunk/169497 -> ciflow/trunk/169497 2025-12-04T08:57:06.5052079Z * [new tag] ciflow/trunk/169503 -> ciflow/trunk/169503 2025-12-04T08:57:06.5053259Z * [new tag] ciflow/trunk/169505 -> ciflow/trunk/169505 2025-12-04T08:57:06.5054350Z * [new tag] ciflow/trunk/169507 -> ciflow/trunk/169507 2025-12-04T08:57:06.5055484Z * [new tag] ciflow/trunk/169514 -> ciflow/trunk/169514 2025-12-04T08:57:06.5056736Z * [new tag] ciflow/trunk/169517 -> ciflow/trunk/169517 2025-12-04T08:57:06.5057930Z * [new tag] ciflow/trunk/169519 -> ciflow/trunk/169519 2025-12-04T08:57:06.5058977Z * [new tag] ciflow/trunk/169528 -> ciflow/trunk/169528 2025-12-04T08:57:06.5060067Z * [new tag] ciflow/trunk/169541 -> ciflow/trunk/169541 2025-12-04T08:57:06.5061310Z * [new tag] ciflow/trunk/169555 -> ciflow/trunk/169555 2025-12-04T08:57:06.5063029Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-12-04T08:57:06.5064273Z * [new tag] ciflow/vllm/165270 -> ciflow/vllm/165270 2025-12-04T08:57:06.5065365Z * [new tag] ciflow/vllm/165274 -> ciflow/vllm/165274 2025-12-04T08:57:06.5066524Z * [new tag] ciflow/vllm/166494 -> ciflow/vllm/166494 2025-12-04T08:57:06.5067605Z * [new tag] ciflow/vllm/169219 -> ciflow/vllm/169219 2025-12-04T08:57:06.5068610Z * [new tag] ciflow/vllm/169220 -> ciflow/vllm/169220 2025-12-04T08:57:06.5070598Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-12-04T08:57:06.5071584Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-12-04T08:57:06.5072642Z * [new tag] ciflow/xpu/161940 -> ciflow/xpu/161940 2025-12-04T08:57:06.5073901Z * [new tag] ciflow/xpu/163251 -> ciflow/xpu/163251 2025-12-04T08:57:06.5074894Z * [new tag] ciflow/xpu/166829 -> ciflow/xpu/166829 2025-12-04T08:57:06.5075930Z * [new tag] ciflow/xpu/166843 -> ciflow/xpu/166843 2025-12-04T08:57:06.5077236Z * [new tag] ciflow/xpu/167972 -> ciflow/xpu/167972 2025-12-04T08:57:06.5078168Z * [new tag] ciflow/xpu/167981 -> ciflow/xpu/167981 2025-12-04T08:57:06.5079191Z * [new tag] ciflow/xpu/168213 -> ciflow/xpu/168213 2025-12-04T08:57:06.5080353Z * [new tag] ciflow/xpu/168262 -> ciflow/xpu/168262 2025-12-04T08:57:06.5081386Z * [new tag] ciflow/xpu/168328 -> ciflow/xpu/168328 2025-12-04T08:57:06.5082857Z * [new tag] ciflow/xpu/168950 -> ciflow/xpu/168950 2025-12-04T08:57:06.5084472Z * [new tag] ciflow/xpu/169039 -> ciflow/xpu/169039 2025-12-04T08:57:06.5085628Z * [new tag] ciflow/xpu/169200 -> ciflow/xpu/169200 2025-12-04T08:57:06.5086786Z * [new tag] ciflow/xpu/169203 -> ciflow/xpu/169203 2025-12-04T08:57:06.5087989Z * [new tag] ciflow/xpu/169229 -> ciflow/xpu/169229 2025-12-04T08:57:06.5089118Z * [new tag] ciflow/xpu/169230 -> ciflow/xpu/169230 2025-12-04T08:57:06.5090178Z * [new tag] ciflow/xpu/169231 -> ciflow/xpu/169231 2025-12-04T08:57:06.5091387Z * [new tag] ciflow/xpu/169241 -> ciflow/xpu/169241 2025-12-04T08:57:06.5092540Z * [new tag] ciflow/xpu/169280 -> ciflow/xpu/169280 2025-12-04T08:57:06.5093778Z * [new tag] ciflow/xpu/169296 -> ciflow/xpu/169296 2025-12-04T08:57:06.5094926Z * [new tag] ciflow/xpu/169353 -> ciflow/xpu/169353 2025-12-04T08:57:06.5096062Z * [new tag] ciflow/xpu/169410 -> ciflow/xpu/169410 2025-12-04T08:57:06.5097247Z * [new tag] ciflow/xpu/169442 -> ciflow/xpu/169442 2025-12-04T08:57:06.5098484Z * [new tag] ciflow/xpu/169555 -> ciflow/xpu/169555 2025-12-04T08:57:06.5099553Z * [new tag] cslpull75 -> cslpull75 2025-12-04T08:57:06.5100673Z * [new tag] cslpull76 -> cslpull76 2025-12-04T08:57:06.5101743Z * [new tag] cslpull77 -> cslpull77 2025-12-04T08:57:06.5102918Z * [new tag] cslpull78 -> cslpull78 2025-12-04T08:57:06.5104566Z * [new tag] cslpull79 -> cslpull79 2025-12-04T08:57:06.5105763Z * [new tag] cslpull80 -> cslpull80 2025-12-04T08:57:06.5107124Z * [new tag] cslpull81 -> cslpull81 2025-12-04T08:57:06.5108427Z * [new tag] cslpull82 -> cslpull82 2025-12-04T08:57:06.5109526Z * [new tag] cslpull83 -> cslpull83 2025-12-04T08:57:06.5110673Z * [new tag] cslpull84 -> cslpull84 2025-12-04T08:57:06.5111890Z * [new tag] cslpull85 -> cslpull85 2025-12-04T08:57:06.5113273Z * [new tag] cslpull86 -> cslpull86 2025-12-04T08:57:06.5114423Z * [new tag] cslpull87 -> cslpull87 2025-12-04T08:57:06.5115518Z * [new tag] cslpull88 -> cslpull88 2025-12-04T08:57:06.5116754Z * [new tag] cslpull89 -> cslpull89 2025-12-04T08:57:06.5118062Z * [new tag] cslpull90 -> cslpull90 2025-12-04T08:57:06.5119620Z * [new tag] cslpull91 -> cslpull91 2025-12-04T08:57:06.5120768Z * [new tag] cslpull92 -> cslpull92 2025-12-04T08:57:06.5122169Z * [new tag] flight_5 -> flight_5 2025-12-04T08:57:06.5123461Z * [new tag] flight_5.1 -> flight_5.1 2025-12-04T08:57:06.5124626Z * [new tag] flight_5.2 -> flight_5.2 2025-12-04T08:57:06.5126214Z * [new tag] flight_5.3 -> flight_5.3 2025-12-04T08:57:06.5127250Z * [new tag] forpull1 -> forpull1 2025-12-04T08:57:06.5128828Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-12-04T08:57:06.5129984Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-12-04T08:57:06.5131116Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-12-04T08:57:06.5132379Z * [new tag] nightly-binary -> nightly-binary 2025-12-04T08:57:06.5133664Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-12-04T08:57:06.5134957Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-12-04T08:57:06.5136641Z * [new tag] trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 2025-12-04T08:57:06.5137794Z * [new tag] trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e 2025-12-04T08:57:06.5139242Z * [new tag] trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 2025-12-04T08:57:06.5141088Z * [new tag] trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654 2025-12-04T08:57:06.5142196Z * [new tag] trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb 2025-12-04T08:57:06.5143460Z * [new tag] trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b 2025-12-04T08:57:06.5144832Z * [new tag] trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 2025-12-04T08:57:06.5146017Z * [new tag] trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75 2025-12-04T08:57:06.5147460Z * [new tag] trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 2025-12-04T08:57:06.5148754Z * [new tag] trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 2025-12-04T08:57:06.5150182Z * [new tag] trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688 2025-12-04T08:57:06.5151333Z * [new tag] trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10 2025-12-04T08:57:06.5152593Z * [new tag] trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec 2025-12-04T08:57:06.5153800Z * [new tag] trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f 2025-12-04T08:57:06.5156043Z * [new tag] trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 2025-12-04T08:57:06.5157234Z * [new tag] trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 2025-12-04T08:57:06.5158000Z * [new tag] trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 2025-12-04T08:57:06.5159247Z * [new tag] trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 2025-12-04T08:57:06.5160518Z * [new tag] trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e 2025-12-04T08:57:06.5161783Z * [new tag] trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520 2025-12-04T08:57:06.5163083Z * [new tag] trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 2025-12-04T08:57:06.5164418Z * [new tag] trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c 2025-12-04T08:57:06.5165544Z * [new tag] trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d 2025-12-04T08:57:06.5166821Z * [new tag] trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 2025-12-04T08:57:06.5168120Z * [new tag] trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de 2025-12-04T08:57:06.5169389Z * [new tag] trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 2025-12-04T08:57:06.5170682Z * [new tag] trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 2025-12-04T08:57:06.5171912Z * [new tag] trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f 2025-12-04T08:57:06.5173274Z * [new tag] trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9 2025-12-04T08:57:06.5174449Z * [new tag] trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87 2025-12-04T08:57:06.5175709Z * [new tag] trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc 2025-12-04T08:57:06.5176939Z * [new tag] trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25 2025-12-04T08:57:06.5178246Z * [new tag] trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 2025-12-04T08:57:06.5179529Z * [new tag] trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf 2025-12-04T08:57:06.5180669Z * [new tag] trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd 2025-12-04T08:57:06.5181914Z * [new tag] trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 2025-12-04T08:57:06.5183282Z * [new tag] trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 2025-12-04T08:57:06.5184530Z * [new tag] trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35 2025-12-04T08:57:06.5185811Z * [new tag] trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec 2025-12-04T08:57:06.5187350Z * [new tag] trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 2025-12-04T08:57:06.5188543Z * [new tag] trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1 2025-12-04T08:57:06.5189802Z * [new tag] trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 2025-12-04T08:57:06.5191605Z * [new tag] trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 2025-12-04T08:57:06.5192843Z * [new tag] trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 2025-12-04T08:57:06.5194235Z * [new tag] trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf 2025-12-04T08:57:06.5195535Z * [new tag] trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee 2025-12-04T08:57:06.5196828Z * [new tag] trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 2025-12-04T08:57:06.5197978Z * [new tag] trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 2025-12-04T08:57:06.5199244Z * [new tag] trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae 2025-12-04T08:57:06.5200605Z * [new tag] trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f 2025-12-04T08:57:06.5201845Z * [new tag] trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c 2025-12-04T08:57:06.5203180Z * [new tag] trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247 2025-12-04T08:57:06.5204393Z * [new tag] trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c 2025-12-04T08:57:06.5205636Z * [new tag] trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a 2025-12-04T08:57:06.5206922Z * [new tag] trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686 2025-12-04T08:57:06.5208553Z * [new tag] trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 2025-12-04T08:57:06.5209589Z * [new tag] trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af 2025-12-04T08:57:06.5210838Z * [new tag] trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 2025-12-04T08:57:06.5212172Z * [new tag] trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873 2025-12-04T08:57:06.5213422Z * [new tag] trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 2025-12-04T08:57:06.5214564Z * [new tag] trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f 2025-12-04T08:57:06.5215965Z * [new tag] trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa 2025-12-04T08:57:06.5217355Z * [new tag] trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c 2025-12-04T08:57:06.5221042Z * [new tag] trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a 2025-12-04T08:57:06.5222096Z * [new tag] trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d 2025-12-04T08:57:06.5223301Z * [new tag] trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 2025-12-04T08:57:06.5224565Z * [new tag] trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 2025-12-04T08:57:06.5225894Z * [new tag] trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a 2025-12-04T08:57:06.5227217Z * [new tag] trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 2025-12-04T08:57:06.5228495Z * [new tag] trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 2025-12-04T08:57:06.5229870Z * [new tag] trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 2025-12-04T08:57:06.5231184Z * [new tag] trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96 2025-12-04T08:57:06.5232335Z * [new tag] trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc 2025-12-04T08:57:06.5233617Z * [new tag] trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 2025-12-04T08:57:06.5234887Z * [new tag] trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd 2025-12-04T08:57:06.5236018Z * [new tag] trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 2025-12-04T08:57:06.5237396Z * [new tag] trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 2025-12-04T08:57:06.5238668Z * [new tag] trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c 2025-12-04T08:57:06.5240006Z * [new tag] trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b 2025-12-04T08:57:06.5241200Z * [new tag] trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c 2025-12-04T08:57:06.5242406Z * [new tag] trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 2025-12-04T08:57:06.5243761Z * [new tag] trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 2025-12-04T08:57:06.5245068Z * [new tag] trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef 2025-12-04T08:57:06.5246328Z * [new tag] trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719 2025-12-04T08:57:06.5247629Z * [new tag] trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b 2025-12-04T08:57:06.5248914Z * [new tag] trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a 2025-12-04T08:57:06.5250221Z * [new tag] trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206 2025-12-04T08:57:06.5251484Z * [new tag] trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 2025-12-04T08:57:06.5252742Z * [new tag] trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 2025-12-04T08:57:06.5254165Z * [new tag] trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d 2025-12-04T08:57:06.5255322Z * [new tag] trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b 2025-12-04T08:57:06.5256531Z * [new tag] trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 2025-12-04T08:57:06.5257818Z * [new tag] trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 2025-12-04T08:57:06.5259150Z * [new tag] trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec 2025-12-04T08:57:06.5260363Z * [new tag] trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 2025-12-04T08:57:06.5261635Z * [new tag] trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d 2025-12-04T08:57:06.5262928Z * [new tag] trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a 2025-12-04T08:57:06.5264226Z * [new tag] trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e 2025-12-04T08:57:06.5265462Z * [new tag] trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 2025-12-04T08:57:06.5266704Z * [new tag] trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485 2025-12-04T08:57:06.5268010Z * [new tag] trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734 2025-12-04T08:57:06.5269344Z * [new tag] trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f 2025-12-04T08:57:06.5270651Z * [new tag] trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 2025-12-04T08:57:06.5271998Z * [new tag] trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03 2025-12-04T08:57:06.5273656Z * [new tag] trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 2025-12-04T08:57:06.5274842Z * [new tag] trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 2025-12-04T08:57:06.5276343Z * [new tag] trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 2025-12-04T08:57:06.5277615Z * [new tag] trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca 2025-12-04T08:57:06.5278881Z * [new tag] trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b 2025-12-04T08:57:06.5280219Z * [new tag] trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609 2025-12-04T08:57:06.5281608Z * [new tag] trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b 2025-12-04T08:57:06.5282715Z * [new tag] trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T08:57:06.5283957Z * [new tag] trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 2025-12-04T08:57:06.5285181Z * [new tag] trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed 2025-12-04T08:57:06.5286471Z * [new tag] trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 2025-12-04T08:57:06.5287854Z * [new tag] trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e 2025-12-04T08:57:06.5288923Z * [new tag] trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead 2025-12-04T08:57:06.5290212Z * [new tag] trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718 2025-12-04T08:57:06.5291670Z * [new tag] trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0 2025-12-04T08:57:06.5293274Z * [new tag] trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75 2025-12-04T08:57:06.5294402Z * [new tag] trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece 2025-12-04T08:57:06.5295537Z * [new tag] trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6 2025-12-04T08:57:06.5296924Z * [new tag] trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 2025-12-04T08:57:06.5298165Z * [new tag] trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c 2025-12-04T08:57:06.5299425Z * [new tag] trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 2025-12-04T08:57:06.5300791Z * [new tag] trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 2025-12-04T08:57:06.5302122Z * [new tag] trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca 2025-12-04T08:57:06.5303461Z * [new tag] trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38 2025-12-04T08:57:06.5304591Z * [new tag] trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 2025-12-04T08:57:06.5305816Z * [new tag] trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c 2025-12-04T08:57:06.5307106Z * [new tag] trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 2025-12-04T08:57:06.5308312Z * [new tag] trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 2025-12-04T08:57:06.5309655Z * [new tag] trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa 2025-12-04T08:57:06.5310960Z * [new tag] trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d 2025-12-04T08:57:06.5312308Z * [new tag] trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 2025-12-04T08:57:06.5313671Z * [new tag] trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 2025-12-04T08:57:06.5314962Z * [new tag] trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d 2025-12-04T08:57:06.5316253Z * [new tag] trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a 2025-12-04T08:57:06.5317767Z * [new tag] trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 2025-12-04T08:57:06.5319162Z * [new tag] trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 2025-12-04T08:57:06.5320460Z * [new tag] trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa 2025-12-04T08:57:06.5321775Z * [new tag] trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d 2025-12-04T08:57:06.5323509Z * [new tag] trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c 2025-12-04T08:57:06.5324600Z * [new tag] trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 2025-12-04T08:57:06.5325825Z * [new tag] trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c 2025-12-04T08:57:06.5326987Z * [new tag] trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45 2025-12-04T08:57:06.5328445Z * [new tag] trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 2025-12-04T08:57:06.5329689Z * [new tag] trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e 2025-12-04T08:57:06.5330972Z * [new tag] trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e 2025-12-04T08:57:06.5332132Z * [new tag] trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e 2025-12-04T08:57:06.5333344Z * [new tag] trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 2025-12-04T08:57:06.5334631Z * [new tag] trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 2025-12-04T08:57:06.5335963Z * [new tag] trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 2025-12-04T08:57:06.5337317Z * [new tag] trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99 2025-12-04T08:57:06.5338659Z * [new tag] trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 2025-12-04T08:57:06.5340058Z * [new tag] trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 2025-12-04T08:57:06.5341399Z * [new tag] trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a 2025-12-04T08:57:06.5342713Z * [new tag] trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 2025-12-04T08:57:06.5344033Z * [new tag] trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 2025-12-04T08:57:06.5345385Z * [new tag] trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b 2025-12-04T08:57:06.5346711Z * [new tag] trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e 2025-12-04T08:57:06.5348101Z * [new tag] trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf 2025-12-04T08:57:06.5349399Z * [new tag] trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 2025-12-04T08:57:06.5350806Z * [new tag] trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f 2025-12-04T08:57:06.5352131Z * [new tag] trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f 2025-12-04T08:57:06.5353489Z * [new tag] trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2 2025-12-04T08:57:06.5354769Z * [new tag] trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 2025-12-04T08:57:06.5356317Z * [new tag] trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 2025-12-04T08:57:06.5357698Z * [new tag] trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 2025-12-04T08:57:06.5358972Z * [new tag] trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 2025-12-04T08:57:06.5361069Z * [new tag] trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 2025-12-04T08:57:06.5362295Z * [new tag] trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede 2025-12-04T08:57:06.5363625Z * [new tag] trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac 2025-12-04T08:57:06.5364949Z * [new tag] trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb 2025-12-04T08:57:06.5366182Z * [new tag] trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 2025-12-04T08:57:06.5367790Z * [new tag] trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 2025-12-04T08:57:06.5369025Z * [new tag] trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09 2025-12-04T08:57:06.5370254Z * [new tag] trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 2025-12-04T08:57:06.5371615Z * [new tag] trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a 2025-12-04T08:57:06.5372965Z * [new tag] trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace 2025-12-04T08:57:06.5374260Z * [new tag] trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39 2025-12-04T08:57:06.5375537Z * [new tag] trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 2025-12-04T08:57:06.5376852Z * [new tag] trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 2025-12-04T08:57:06.5378250Z * [new tag] trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf 2025-12-04T08:57:06.5379601Z * [new tag] trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 2025-12-04T08:57:06.5380828Z * [new tag] trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d 2025-12-04T08:57:06.5382166Z * [new tag] trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 2025-12-04T08:57:06.5383454Z * [new tag] trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 2025-12-04T08:57:06.5384775Z * [new tag] trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e 2025-12-04T08:57:06.5386210Z * [new tag] trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a 2025-12-04T08:57:06.5387515Z * [new tag] trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b 2025-12-04T08:57:06.5388824Z * [new tag] trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec 2025-12-04T08:57:06.5390142Z * [new tag] trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf 2025-12-04T08:57:06.5391462Z * [new tag] trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd 2025-12-04T08:57:06.5392809Z * [new tag] trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889 2025-12-04T08:57:06.5394243Z * [new tag] trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77 2025-12-04T08:57:06.5395731Z * [new tag] trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c 2025-12-04T08:57:06.5397081Z * [new tag] trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b 2025-12-04T08:57:06.5398257Z * [new tag] trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235 2025-12-04T08:57:06.5399708Z * [new tag] trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e 2025-12-04T08:57:06.5401223Z * [new tag] trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc 2025-12-04T08:57:06.5402488Z * [new tag] trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 2025-12-04T08:57:06.5403772Z * [new tag] trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf 2025-12-04T08:57:06.5405102Z * [new tag] trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e 2025-12-04T08:57:06.5406393Z * [new tag] trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e 2025-12-04T08:57:06.5408150Z * [new tag] trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 2025-12-04T08:57:06.5409447Z * [new tag] trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 2025-12-04T08:57:06.5410774Z * [new tag] trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 2025-12-04T08:57:06.5412067Z * [new tag] trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598 2025-12-04T08:57:06.5413394Z * [new tag] trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 2025-12-04T08:57:06.5414706Z * [new tag] trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 2025-12-04T08:57:06.5416024Z * [new tag] trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 2025-12-04T08:57:06.5417433Z * [new tag] trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 2025-12-04T08:57:06.5418810Z * [new tag] trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 2025-12-04T08:57:06.5420176Z * [new tag] trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 2025-12-04T08:57:06.5421374Z * [new tag] trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 2025-12-04T08:57:06.5422705Z * [new tag] trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b 2025-12-04T08:57:06.5424092Z * [new tag] trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 2025-12-04T08:57:06.5425817Z * [new tag] trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 2025-12-04T08:57:06.5427111Z * [new tag] trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 2025-12-04T08:57:06.5428433Z * [new tag] trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:06.5429705Z * [new tag] v0.1.1 -> v0.1.1 2025-12-04T08:57:06.5430862Z * [new tag] v0.1.10 -> v0.1.10 2025-12-04T08:57:06.5431914Z * [new tag] v0.1.11 -> v0.1.11 2025-12-04T08:57:06.5433126Z * [new tag] v0.1.12 -> v0.1.12 2025-12-04T08:57:06.5434337Z * [new tag] v0.1.2 -> v0.1.2 2025-12-04T08:57:06.5435531Z * [new tag] v0.1.3 -> v0.1.3 2025-12-04T08:57:06.5436705Z * [new tag] v0.1.4 -> v0.1.4 2025-12-04T08:57:06.5437926Z * [new tag] v0.1.5 -> v0.1.5 2025-12-04T08:57:06.5439144Z * [new tag] v0.1.6 -> v0.1.6 2025-12-04T08:57:06.5440492Z * [new tag] v0.1.7 -> v0.1.7 2025-12-04T08:57:06.5441719Z * [new tag] v0.1.8 -> v0.1.8 2025-12-04T08:57:06.5442889Z * [new tag] v0.1.9 -> v0.1.9 2025-12-04T08:57:06.5444136Z * [new tag] v0.2.0 -> v0.2.0 2025-12-04T08:57:06.5445428Z * [new tag] v0.3.0 -> v0.3.0 2025-12-04T08:57:06.5446673Z * [new tag] v0.3.1 -> v0.3.1 2025-12-04T08:57:06.5447879Z * [new tag] v0.4.0 -> v0.4.0 2025-12-04T08:57:06.5449109Z * [new tag] v0.4.1 -> v0.4.1 2025-12-04T08:57:06.5450346Z * [new tag] v1.0.0 -> v1.0.0 2025-12-04T08:57:06.5451590Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-12-04T08:57:06.5452835Z * [new tag] v1.0.1 -> v1.0.1 2025-12-04T08:57:06.5454074Z * [new tag] v1.0rc0 -> v1.0rc0 2025-12-04T08:57:06.5455232Z * [new tag] v1.0rc1 -> v1.0rc1 2025-12-04T08:57:06.5456397Z * [new tag] v1.1.0 -> v1.1.0 2025-12-04T08:57:06.5457624Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-12-04T08:57:06.5459037Z * [new tag] v1.10.0 -> v1.10.0 2025-12-04T08:57:06.5460453Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-12-04T08:57:06.5461690Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-12-04T08:57:06.5462785Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-12-04T08:57:06.5464397Z * [new tag] v1.10.1 -> v1.10.1 2025-12-04T08:57:06.5465549Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-12-04T08:57:06.5466567Z * [new tag] v1.10.2 -> v1.10.2 2025-12-04T08:57:06.5467520Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-12-04T08:57:06.5468812Z * [new tag] v1.11.0 -> v1.11.0 2025-12-04T08:57:06.5470175Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-12-04T08:57:06.5471448Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-12-04T08:57:06.5472839Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-12-04T08:57:06.5474143Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-12-04T08:57:06.5475411Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-12-04T08:57:06.5476527Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-12-04T08:57:06.5477566Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-12-04T08:57:06.5478803Z * [new tag] v1.12.0 -> v1.12.0 2025-12-04T08:57:06.5480085Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-12-04T08:57:06.5481522Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-12-04T08:57:06.5482757Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-12-04T08:57:06.5484012Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-12-04T08:57:06.5485241Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-12-04T08:57:06.5486511Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-12-04T08:57:06.5487683Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-12-04T08:57:06.5488783Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-12-04T08:57:06.5489816Z * [new tag] v1.12.1 -> v1.12.1 2025-12-04T08:57:06.5491111Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-12-04T08:57:06.5492363Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-12-04T08:57:06.5493683Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-12-04T08:57:06.5494965Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-12-04T08:57:06.5496072Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-12-04T08:57:06.5497305Z * [new tag] v1.13.0 -> v1.13.0 2025-12-04T08:57:06.5498509Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-12-04T08:57:06.5499718Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-12-04T08:57:06.5500897Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-12-04T08:57:06.5502275Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-12-04T08:57:06.5503395Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-12-04T08:57:06.5504398Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-12-04T08:57:06.5505673Z * [new tag] v1.13.1 -> v1.13.1 2025-12-04T08:57:06.5506801Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-12-04T08:57:06.5507989Z * [new tag] v1.2.0 -> v1.2.0 2025-12-04T08:57:06.5509241Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-12-04T08:57:06.5510453Z * [new tag] v1.3.0 -> v1.3.0 2025-12-04T08:57:06.5511734Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-12-04T08:57:06.5512844Z * [new tag] v1.3.1 -> v1.3.1 2025-12-04T08:57:06.5514017Z * [new tag] v1.4.0 -> v1.4.0 2025-12-04T08:57:06.5515225Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-12-04T08:57:06.5516329Z * [new tag] v1.4.1 -> v1.4.1 2025-12-04T08:57:06.5517710Z * [new tag] v1.5.0 -> v1.5.0 2025-12-04T08:57:06.5519024Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-12-04T08:57:06.5520453Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-12-04T08:57:06.5521746Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-12-04T08:57:06.5522921Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-12-04T08:57:06.5524041Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-12-04T08:57:06.5525265Z * [new tag] v1.5.1 -> v1.5.1 2025-12-04T08:57:06.5526394Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-12-04T08:57:06.5527396Z * [new tag] v1.6.0 -> v1.6.0 2025-12-04T08:57:06.5528662Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-12-04T08:57:06.5529932Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-12-04T08:57:06.5531198Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-12-04T08:57:06.5532673Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-12-04T08:57:06.5533924Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-12-04T08:57:06.5535106Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-12-04T08:57:06.5536198Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-12-04T08:57:06.5537372Z * [new tag] v1.7.0 -> v1.7.0 2025-12-04T08:57:06.5538614Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-12-04T08:57:06.5540004Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-12-04T08:57:06.5541235Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-12-04T08:57:06.5542691Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-12-04T08:57:06.5543996Z * [new tag] v1.7.1 -> v1.7.1 2025-12-04T08:57:06.5545344Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-12-04T08:57:06.5546627Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-12-04T08:57:06.5547776Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-12-04T08:57:06.5548978Z * [new tag] v1.8.0 -> v1.8.0 2025-12-04T08:57:06.5550100Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-12-04T08:57:06.5551311Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-12-04T08:57:06.5552574Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-12-04T08:57:06.5553794Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-12-04T08:57:06.5554894Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-12-04T08:57:06.5555994Z * [new tag] v1.8.1 -> v1.8.1 2025-12-04T08:57:06.5557197Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-12-04T08:57:06.5558334Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-12-04T08:57:06.5559472Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-12-04T08:57:06.5561188Z * [new tag] v1.8.2 -> v1.8.2 2025-12-04T08:57:06.5562324Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-12-04T08:57:06.5563502Z * [new tag] v1.9.0 -> v1.9.0 2025-12-04T08:57:06.5564762Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-12-04T08:57:06.5566041Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-12-04T08:57:06.5567344Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-12-04T08:57:06.5568494Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-12-04T08:57:06.5569685Z * [new tag] v1.9.1 -> v1.9.1 2025-12-04T08:57:06.5571061Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-12-04T08:57:06.5572209Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-12-04T08:57:06.5573442Z * [new tag] v2.0.0 -> v2.0.0 2025-12-04T08:57:06.5574667Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-12-04T08:57:06.5575996Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-12-04T08:57:06.5577278Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-12-04T08:57:06.5578507Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-12-04T08:57:06.5579797Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-12-04T08:57:06.5580907Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-12-04T08:57:06.5582154Z * [new tag] v2.0.1 -> v2.0.1 2025-12-04T08:57:06.5583552Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-12-04T08:57:06.5584519Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-12-04T08:57:06.5585747Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-12-04T08:57:06.5586850Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-12-04T08:57:06.5588430Z * [new tag] v2.1.0 -> v2.1.0 2025-12-04T08:57:06.5589668Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-12-04T08:57:06.5590933Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-12-04T08:57:06.5592221Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-12-04T08:57:06.5593643Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-12-04T08:57:06.5594927Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-12-04T08:57:06.5596065Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-12-04T08:57:06.5597278Z * [new tag] v2.1.1 -> v2.1.1 2025-12-04T08:57:06.5598652Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-12-04T08:57:06.5600114Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-12-04T08:57:06.5601535Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-12-04T08:57:06.5602818Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-12-04T08:57:06.5604047Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-12-04T08:57:06.5605173Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-12-04T08:57:06.5606344Z * [new tag] v2.1.2 -> v2.1.2 2025-12-04T08:57:06.5607647Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-12-04T08:57:06.5608972Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-12-04T08:57:06.5610114Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-12-04T08:57:06.5611333Z * [new tag] v2.2.0 -> v2.2.0 2025-12-04T08:57:06.5612565Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-12-04T08:57:06.5613782Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-12-04T08:57:06.5615022Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-12-04T08:57:06.5616255Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-12-04T08:57:06.5617705Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-12-04T08:57:06.5618989Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-12-04T08:57:06.5620076Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-12-04T08:57:06.5621159Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-12-04T08:57:06.5622853Z * [new tag] v2.2.1 -> v2.2.1 2025-12-04T08:57:06.5624156Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-12-04T08:57:06.5625423Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-12-04T08:57:06.5626588Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-12-04T08:57:06.5627678Z * [new tag] v2.2.2 -> v2.2.2 2025-12-04T08:57:06.5628964Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-12-04T08:57:06.5630111Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-12-04T08:57:06.5631119Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-12-04T08:57:06.5632421Z * [new tag] v2.3.0 -> v2.3.0 2025-12-04T08:57:06.5633688Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-12-04T08:57:06.5635123Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-12-04T08:57:06.5636561Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-12-04T08:57:06.5637702Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-12-04T08:57:06.5638951Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-12-04T08:57:06.5640436Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-12-04T08:57:06.5641713Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-12-04T08:57:06.5642979Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-12-04T08:57:06.5644103Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-12-04T08:57:06.5645284Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-12-04T08:57:06.5646589Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-12-04T08:57:06.5647761Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-12-04T08:57:06.5648847Z * [new tag] v2.3.1 -> v2.3.1 2025-12-04T08:57:06.5650080Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-12-04T08:57:06.5651349Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-12-04T08:57:06.5652694Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-12-04T08:57:06.5654056Z * [new tag] v2.4.0 -> v2.4.0 2025-12-04T08:57:06.5655324Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-12-04T08:57:06.5656574Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-12-04T08:57:06.5657836Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-12-04T08:57:06.5659079Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-12-04T08:57:06.5660371Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-12-04T08:57:06.5661645Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-12-04T08:57:06.5662885Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-12-04T08:57:06.5664176Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-12-04T08:57:06.5665459Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-12-04T08:57:06.5666621Z * [new tag] v2.4.1 -> v2.4.1 2025-12-04T08:57:06.5667893Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-12-04T08:57:06.5669170Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-12-04T08:57:06.5670464Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-12-04T08:57:06.5671741Z * [new tag] v2.5.0 -> v2.5.0 2025-12-04T08:57:06.5672960Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-12-04T08:57:06.5674074Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-12-04T08:57:06.5675292Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-12-04T08:57:06.5676556Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-12-04T08:57:06.5677794Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-12-04T08:57:06.5679074Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-12-04T08:57:06.5680561Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-12-04T08:57:06.5681831Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-12-04T08:57:06.5683123Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-12-04T08:57:06.5684377Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-12-04T08:57:06.5685471Z * [new tag] v2.5.1 -> v2.5.1 2025-12-04T08:57:06.5686611Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-12-04T08:57:06.5687509Z * [new tag] v2.6.0 -> v2.6.0 2025-12-04T08:57:06.5688853Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-12-04T08:57:06.5690234Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-12-04T08:57:06.5691508Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-12-04T08:57:06.5692746Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-12-04T08:57:06.5694214Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-12-04T08:57:06.5695580Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-12-04T08:57:06.5696885Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-12-04T08:57:06.5698191Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-12-04T08:57:06.5699526Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-12-04T08:57:06.5700991Z * [new tag] v2.7.0 -> v2.7.0 2025-12-04T08:57:06.5702261Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-12-04T08:57:06.5703770Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-12-04T08:57:06.5705090Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-12-04T08:57:06.5706382Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-12-04T08:57:06.5707660Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-12-04T08:57:06.5708946Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-12-04T08:57:06.5710212Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-12-04T08:57:06.5711503Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-12-04T08:57:06.5712810Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-12-04T08:57:06.5714225Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-12-04T08:57:06.5715361Z * [new tag] v2.7.1 -> v2.7.1 2025-12-04T08:57:06.5716677Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-12-04T08:57:06.5720516Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-12-04T08:57:06.5721939Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-12-04T08:57:06.5723221Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-12-04T08:57:06.5724570Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-12-04T08:57:06.5725750Z * [new tag] v2.8.0 -> v2.8.0 2025-12-04T08:57:06.5726999Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-12-04T08:57:06.5728309Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-12-04T08:57:06.5729653Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-12-04T08:57:06.5731041Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-12-04T08:57:06.5732397Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-12-04T08:57:06.5733751Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-12-04T08:57:06.5735060Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-12-04T08:57:06.5736323Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-12-04T08:57:06.5737668Z * [new tag] v2.9.0 -> v2.9.0 2025-12-04T08:57:06.5738986Z * [new tag] v2.9.0-rc1 -> v2.9.0-rc1 2025-12-04T08:57:06.5740264Z * [new tag] v2.9.0-rc10 -> v2.9.0-rc10 2025-12-04T08:57:06.5741614Z * [new tag] v2.9.0-rc11 -> v2.9.0-rc11 2025-12-04T08:57:06.5743237Z * [new tag] v2.9.0-rc2 -> v2.9.0-rc2 2025-12-04T08:57:06.5744461Z * [new tag] v2.9.0-rc3 -> v2.9.0-rc3 2025-12-04T08:57:06.5745827Z * [new tag] v2.9.0-rc4 -> v2.9.0-rc4 2025-12-04T08:57:06.5747071Z * [new tag] v2.9.0-rc5 -> v2.9.0-rc5 2025-12-04T08:57:06.5748545Z * [new tag] v2.9.0-rc6 -> v2.9.0-rc6 2025-12-04T08:57:06.5749835Z * [new tag] v2.9.0-rc7 -> v2.9.0-rc7 2025-12-04T08:57:06.5751266Z * [new tag] v2.9.0-rc8 -> v2.9.0-rc8 2025-12-04T08:57:06.5752429Z * [new tag] v2.9.0-rc9 -> v2.9.0-rc9 2025-12-04T08:57:06.5753524Z * [new tag] v2.9.1 -> v2.9.1 2025-12-04T08:57:06.5754804Z * [new tag] v2.9.1-rc1 -> v2.9.1-rc1 2025-12-04T08:57:06.5756101Z * [new tag] v2.9.1-rc2 -> v2.9.1-rc2 2025-12-04T08:57:06.5757821Z * [new tag] viable/strict/1759343184 -> viable/strict/1759343184 2025-12-04T08:57:06.5759121Z * [new tag] viable/strict/1759346540 -> viable/strict/1759346540 2025-12-04T08:57:06.5760475Z * [new tag] viable/strict/1759348181 -> viable/strict/1759348181 2025-12-04T08:57:06.5761682Z * [new tag] viable/strict/1759350324 -> viable/strict/1759350324 2025-12-04T08:57:06.5762871Z * [new tag] viable/strict/1759351793 -> viable/strict/1759351793 2025-12-04T08:57:06.5764124Z * [new tag] viable/strict/1759353844 -> viable/strict/1759353844 2025-12-04T08:57:06.5765347Z * [new tag] viable/strict/1759355374 -> viable/strict/1759355374 2025-12-04T08:57:06.5766557Z * [new tag] viable/strict/1759357472 -> viable/strict/1759357472 2025-12-04T08:57:06.5767754Z * [new tag] viable/strict/1759361002 -> viable/strict/1759361002 2025-12-04T08:57:06.5769213Z * [new tag] viable/strict/1759362585 -> viable/strict/1759362585 2025-12-04T08:57:06.5770657Z * [new tag] viable/strict/1759365359 -> viable/strict/1759365359 2025-12-04T08:57:06.5771950Z * [new tag] viable/strict/1759370089 -> viable/strict/1759370089 2025-12-04T08:57:06.5773324Z * [new tag] viable/strict/1759377554 -> viable/strict/1759377554 2025-12-04T08:57:06.5774615Z * [new tag] viable/strict/1759379133 -> viable/strict/1759379133 2025-12-04T08:57:06.5775903Z * [new tag] viable/strict/1759389871 -> viable/strict/1759389871 2025-12-04T08:57:06.5777188Z * [new tag] viable/strict/1759393562 -> viable/strict/1759393562 2025-12-04T08:57:06.5778490Z * [new tag] viable/strict/1759395076 -> viable/strict/1759395076 2025-12-04T08:57:06.5779828Z * [new tag] viable/strict/1759398579 -> viable/strict/1759398579 2025-12-04T08:57:06.5781139Z * [new tag] viable/strict/1759404142 -> viable/strict/1759404142 2025-12-04T08:57:06.5782409Z * [new tag] viable/strict/1759405773 -> viable/strict/1759405773 2025-12-04T08:57:06.5783652Z * [new tag] viable/strict/1759408041 -> viable/strict/1759408041 2025-12-04T08:57:06.5784946Z * [new tag] viable/strict/1759411593 -> viable/strict/1759411593 2025-12-04T08:57:06.5786194Z * [new tag] viable/strict/1759427395 -> viable/strict/1759427395 2025-12-04T08:57:06.5787441Z * [new tag] viable/strict/1759434582 -> viable/strict/1759434582 2025-12-04T08:57:06.5788705Z * [new tag] viable/strict/1759436720 -> viable/strict/1759436720 2025-12-04T08:57:06.5790014Z * [new tag] viable/strict/1759440219 -> viable/strict/1759440219 2025-12-04T08:57:06.5791285Z * [new tag] viable/strict/1759441948 -> viable/strict/1759441948 2025-12-04T08:57:06.5792643Z * [new tag] viable/strict/1759443860 -> viable/strict/1759443860 2025-12-04T08:57:06.5793880Z * [new tag] viable/strict/1759445377 -> viable/strict/1759445377 2025-12-04T08:57:06.5795137Z * [new tag] viable/strict/1759447415 -> viable/strict/1759447415 2025-12-04T08:57:06.5796396Z * [new tag] viable/strict/1759451750 -> viable/strict/1759451750 2025-12-04T08:57:06.5797671Z * [new tag] viable/strict/1759453910 -> viable/strict/1759453910 2025-12-04T08:57:06.5798996Z * [new tag] viable/strict/1759456483 -> viable/strict/1759456483 2025-12-04T08:57:06.5800414Z * [new tag] viable/strict/1759459279 -> viable/strict/1759459279 2025-12-04T08:57:06.5801672Z * [new tag] viable/strict/1759460742 -> viable/strict/1759460742 2025-12-04T08:57:06.5802942Z * [new tag] viable/strict/1759462025 -> viable/strict/1759462025 2025-12-04T08:57:06.5804239Z * [new tag] viable/strict/1759469086 -> viable/strict/1759469086 2025-12-04T08:57:06.5805603Z * [new tag] viable/strict/1759470581 -> viable/strict/1759470581 2025-12-04T08:57:06.5806927Z * [new tag] viable/strict/1759472786 -> viable/strict/1759472786 2025-12-04T08:57:06.5808199Z * [new tag] viable/strict/1759476294 -> viable/strict/1759476294 2025-12-04T08:57:06.5809473Z * [new tag] viable/strict/1759479963 -> viable/strict/1759479963 2025-12-04T08:57:06.5810755Z * [new tag] viable/strict/1759492177 -> viable/strict/1759492177 2025-12-04T08:57:06.5812017Z * [new tag] viable/strict/1759519278 -> viable/strict/1759519278 2025-12-04T08:57:06.5813339Z * [new tag] viable/strict/1759524580 -> viable/strict/1759524580 2025-12-04T08:57:06.5814654Z * [new tag] viable/strict/1759528193 -> viable/strict/1759528193 2025-12-04T08:57:06.5816079Z * [new tag] viable/strict/1759533797 -> viable/strict/1759533797 2025-12-04T08:57:06.5817488Z * [new tag] viable/strict/1759542780 -> viable/strict/1759542780 2025-12-04T08:57:06.5818883Z * [new tag] viable/strict/1759549779 -> viable/strict/1759549779 2025-12-04T08:57:06.5820159Z * [new tag] viable/strict/1759555455 -> viable/strict/1759555455 2025-12-04T08:57:06.5821446Z * [new tag] viable/strict/1759559176 -> viable/strict/1759559176 2025-12-04T08:57:06.5822710Z * [new tag] viable/strict/1759560629 -> viable/strict/1759560629 2025-12-04T08:57:06.5823999Z * [new tag] viable/strict/1759569848 -> viable/strict/1759569848 2025-12-04T08:57:06.5825432Z * [new tag] viable/strict/1759571382 -> viable/strict/1759571382 2025-12-04T08:57:06.5826745Z * [new tag] viable/strict/1759573474 -> viable/strict/1759573474 2025-12-04T08:57:06.5827997Z * [new tag] viable/strict/1759618187 -> viable/strict/1759618187 2025-12-04T08:57:06.5829301Z * [new tag] viable/strict/1759626742 -> viable/strict/1759626742 2025-12-04T08:57:06.5830679Z * [new tag] viable/strict/1759632427 -> viable/strict/1759632427 2025-12-04T08:57:06.5831944Z * [new tag] viable/strict/1759634971 -> viable/strict/1759634971 2025-12-04T08:57:06.5833226Z * [new tag] viable/strict/1759661382 -> viable/strict/1759661382 2025-12-04T08:57:06.5834499Z * [new tag] viable/strict/1759663294 -> viable/strict/1759663294 2025-12-04T08:57:06.5835703Z * [new tag] viable/strict/1759708178 -> viable/strict/1759708178 2025-12-04T08:57:06.5837535Z * [new tag] viable/strict/1759715695 -> viable/strict/1759715695 2025-12-04T08:57:06.5838936Z * [new tag] viable/strict/1759728293 -> viable/strict/1759728293 2025-12-04T08:57:06.5840236Z * [new tag] viable/strict/1759735513 -> viable/strict/1759735513 2025-12-04T08:57:06.5841601Z * [new tag] viable/strict/1759739177 -> viable/strict/1759739177 2025-12-04T08:57:06.5842843Z * [new tag] viable/strict/1759758635 -> viable/strict/1759758635 2025-12-04T08:57:06.5844093Z * [new tag] viable/strict/1759765784 -> viable/strict/1759765784 2025-12-04T08:57:06.5845351Z * [new tag] viable/strict/1759767948 -> viable/strict/1759767948 2025-12-04T08:57:06.5846666Z * [new tag] viable/strict/1759771461 -> viable/strict/1759771461 2025-12-04T08:57:06.5847905Z * [new tag] viable/strict/1759776706 -> viable/strict/1759776706 2025-12-04T08:57:06.5849215Z * [new tag] viable/strict/1759782317 -> viable/strict/1759782317 2025-12-04T08:57:06.5850555Z * [new tag] viable/strict/1759783777 -> viable/strict/1759783777 2025-12-04T08:57:06.5851904Z * [new tag] viable/strict/1759785815 -> viable/strict/1759785815 2025-12-04T08:57:06.5853170Z * [new tag] viable/strict/1759789459 -> viable/strict/1759789459 2025-12-04T08:57:06.5854432Z * [new tag] viable/strict/1759790974 -> viable/strict/1759790974 2025-12-04T08:57:06.5855679Z * [new tag] viable/strict/1759794583 -> viable/strict/1759794583 2025-12-04T08:57:06.5856970Z * [new tag] viable/strict/1759797408 -> viable/strict/1759797408 2025-12-04T08:57:06.5858266Z * [new tag] viable/strict/1759799518 -> viable/strict/1759799518 2025-12-04T08:57:06.5859533Z * [new tag] viable/strict/1759804909 -> viable/strict/1759804909 2025-12-04T08:57:06.5860808Z * [new tag] viable/strict/1759807643 -> viable/strict/1759807643 2025-12-04T08:57:06.5862136Z * [new tag] viable/strict/1759809089 -> viable/strict/1759809089 2025-12-04T08:57:06.5863387Z * [new tag] viable/strict/1759811145 -> viable/strict/1759811145 2025-12-04T08:57:06.5864690Z * [new tag] viable/strict/1759812581 -> viable/strict/1759812581 2025-12-04T08:57:06.5865974Z * [new tag] viable/strict/1759814683 -> viable/strict/1759814683 2025-12-04T08:57:06.5867277Z * [new tag] viable/strict/1759821889 -> viable/strict/1759821889 2025-12-04T08:57:06.5868564Z * [new tag] viable/strict/1759823376 -> viable/strict/1759823376 2025-12-04T08:57:06.5869845Z * [new tag] viable/strict/1759827107 -> viable/strict/1759827107 2025-12-04T08:57:06.5871124Z * [new tag] viable/strict/1759830577 -> viable/strict/1759830577 2025-12-04T08:57:06.5872461Z * [new tag] viable/strict/1759832720 -> viable/strict/1759832720 2025-12-04T08:57:06.5873740Z * [new tag] viable/strict/1759842063 -> viable/strict/1759842063 2025-12-04T08:57:06.5874978Z * [new tag] viable/strict/1759847121 -> viable/strict/1759847121 2025-12-04T08:57:06.5876491Z * [new tag] viable/strict/1759850721 -> viable/strict/1759850721 2025-12-04T08:57:06.5877763Z * [new tag] viable/strict/1759857870 -> viable/strict/1759857870 2025-12-04T08:57:06.5879071Z * [new tag] viable/strict/1759863143 -> viable/strict/1759863143 2025-12-04T08:57:06.5880434Z * [new tag] viable/strict/1759875874 -> viable/strict/1759875874 2025-12-04T08:57:06.5881644Z * [new tag] viable/strict/1759877385 -> viable/strict/1759877385 2025-12-04T08:57:06.5882914Z * [new tag] viable/strict/1759883801 -> viable/strict/1759883801 2025-12-04T08:57:06.5884195Z * [new tag] viable/strict/1759885922 -> viable/strict/1759885922 2025-12-04T08:57:06.5885590Z * [new tag] viable/strict/1759888488 -> viable/strict/1759888488 2025-12-04T08:57:06.5886804Z * [new tag] viable/strict/1759895471 -> viable/strict/1759895471 2025-12-04T08:57:06.5888077Z * [new tag] viable/strict/1759904803 -> viable/strict/1759904803 2025-12-04T08:57:06.5889537Z * [new tag] viable/strict/1759908300 -> viable/strict/1759908300 2025-12-04T08:57:06.5890887Z * [new tag] viable/strict/1759915520 -> viable/strict/1759915520 2025-12-04T08:57:06.5892164Z * [new tag] viable/strict/1759916978 -> viable/strict/1759916978 2025-12-04T08:57:06.5893381Z * [new tag] viable/strict/1759930024 -> viable/strict/1759930024 2025-12-04T08:57:06.5894634Z * [new tag] viable/strict/1759948122 -> viable/strict/1759948122 2025-12-04T08:57:06.5895914Z * [new tag] viable/strict/1759952983 -> viable/strict/1759952983 2025-12-04T08:57:06.5897302Z * [new tag] viable/strict/1759955121 -> viable/strict/1759955121 2025-12-04T08:57:06.5898890Z * [new tag] viable/strict/1759962298 -> viable/strict/1759962298 2025-12-04T08:57:06.5900216Z * [new tag] viable/strict/1759965837 -> viable/strict/1759965837 2025-12-04T08:57:06.5901499Z * [new tag] viable/strict/1759970213 -> viable/strict/1759970213 2025-12-04T08:57:06.5902880Z * [new tag] viable/strict/1759974894 -> viable/strict/1759974894 2025-12-04T08:57:06.5904164Z * [new tag] viable/strict/1759977763 -> viable/strict/1759977763 2025-12-04T08:57:06.5905515Z * [new tag] viable/strict/1759979241 -> viable/strict/1759979241 2025-12-04T08:57:06.5906797Z * [new tag] viable/strict/1759985417 -> viable/strict/1759985417 2025-12-04T08:57:06.5908076Z * [new tag] viable/strict/1759987490 -> viable/strict/1759987490 2025-12-04T08:57:06.5909363Z * [new tag] viable/strict/1759996180 -> viable/strict/1759996180 2025-12-04T08:57:06.5922925Z * [new tag] viable/strict/1760065682 -> viable/strict/1760065682 2025-12-04T08:57:06.5923303Z * [new tag] viable/strict/1760066894 -> viable/strict/1760066894 2025-12-04T08:57:06.5923483Z * [new tag] viable/strict/1760070345 -> viable/strict/1760070345 2025-12-04T08:57:06.5923630Z * [new tag] viable/strict/1760089782 -> viable/strict/1760089782 2025-12-04T08:57:06.5923765Z * [new tag] viable/strict/1760091921 -> viable/strict/1760091921 2025-12-04T08:57:06.5923915Z * [new tag] viable/strict/1760127924 -> viable/strict/1760127924 2025-12-04T08:57:06.5924052Z * [new tag] viable/strict/1760129489 -> viable/strict/1760129489 2025-12-04T08:57:06.5924180Z * [new tag] viable/strict/1760132980 -> viable/strict/1760132980 2025-12-04T08:57:06.5924421Z * [new tag] viable/strict/1760135060 -> viable/strict/1760135060 2025-12-04T08:57:06.5924653Z * [new tag] viable/strict/1760215782 -> viable/strict/1760215782 2025-12-04T08:57:06.5924812Z * [new tag] viable/strict/1760273849 -> viable/strict/1760273849 2025-12-04T08:57:06.5925999Z * [new tag] viable/strict/1760275517 -> viable/strict/1760275517 2025-12-04T08:57:06.5927214Z * [new tag] viable/strict/1760276979 -> viable/strict/1760276979 2025-12-04T08:57:06.5928469Z * [new tag] viable/strict/1760279007 -> viable/strict/1760279007 2025-12-04T08:57:06.5929821Z * [new tag] viable/strict/1760286328 -> viable/strict/1760286328 2025-12-04T08:57:06.5931037Z * [new tag] viable/strict/1760493304 -> viable/strict/1760493304 2025-12-04T08:57:06.5932322Z * [new tag] viable/strict/1760496298 -> viable/strict/1760496298 2025-12-04T08:57:06.5933813Z * [new tag] viable/strict/1760518396 -> viable/strict/1760518396 2025-12-04T08:57:06.5935083Z * [new tag] viable/strict/1760534864 -> viable/strict/1760534864 2025-12-04T08:57:06.5936296Z * [new tag] viable/strict/1760549062 -> viable/strict/1760549062 2025-12-04T08:57:06.5937597Z * [new tag] viable/strict/1760552799 -> viable/strict/1760552799 2025-12-04T08:57:06.5938933Z * [new tag] viable/strict/1760554355 -> viable/strict/1760554355 2025-12-04T08:57:06.5940289Z * [new tag] viable/strict/1760556275 -> viable/strict/1760556275 2025-12-04T08:57:06.5941578Z * [new tag] viable/strict/1760564979 -> viable/strict/1760564979 2025-12-04T08:57:06.5942906Z * [new tag] viable/strict/1760567049 -> viable/strict/1760567049 2025-12-04T08:57:06.5944502Z * [new tag] viable/strict/1760568585 -> viable/strict/1760568585 2025-12-04T08:57:06.5945838Z * [new tag] viable/strict/1760570630 -> viable/strict/1760570630 2025-12-04T08:57:06.5947117Z * [new tag] viable/strict/1760572180 -> viable/strict/1760572180 2025-12-04T08:57:06.5948371Z * [new tag] viable/strict/1760575094 -> viable/strict/1760575094 2025-12-04T08:57:06.5949754Z * [new tag] viable/strict/1760579709 -> viable/strict/1760579709 2025-12-04T08:57:06.5951407Z * [new tag] viable/strict/1760582614 -> viable/strict/1760582614 2025-12-04T08:57:06.5952750Z * [new tag] viable/strict/1760586815 -> viable/strict/1760586815 2025-12-04T08:57:06.5953952Z * [new tag] viable/strict/1760588829 -> viable/strict/1760588829 2025-12-04T08:57:06.5955191Z * [new tag] viable/strict/1760590200 -> viable/strict/1760590200 2025-12-04T08:57:06.5956516Z * [new tag] viable/strict/1760592311 -> viable/strict/1760592311 2025-12-04T08:57:06.5957782Z * [new tag] viable/strict/1760619733 -> viable/strict/1760619733 2025-12-04T08:57:06.5959001Z * [new tag] viable/strict/1760628335 -> viable/strict/1760628335 2025-12-04T08:57:06.5960314Z * [new tag] viable/strict/1760635490 -> viable/strict/1760635490 2025-12-04T08:57:06.5961617Z * [new tag] viable/strict/1760640743 -> viable/strict/1760640743 2025-12-04T08:57:06.5962890Z * [new tag] viable/strict/1760642528 -> viable/strict/1760642528 2025-12-04T08:57:06.5964174Z * [new tag] viable/strict/1760646330 -> viable/strict/1760646330 2025-12-04T08:57:06.5965434Z * [new tag] viable/strict/1760666101 -> viable/strict/1760666101 2025-12-04T08:57:06.5966737Z * [new tag] viable/strict/1760668990 -> viable/strict/1760668990 2025-12-04T08:57:06.5968107Z * [new tag] viable/strict/1760670600 -> viable/strict/1760670600 2025-12-04T08:57:06.5969371Z * [new tag] viable/strict/1760671704 -> viable/strict/1760671704 2025-12-04T08:57:06.5970660Z * [new tag] viable/strict/1760673121 -> viable/strict/1760673121 2025-12-04T08:57:06.5971947Z * [new tag] viable/strict/1760675352 -> viable/strict/1760675352 2025-12-04T08:57:06.5973231Z * [new tag] viable/strict/1760696731 -> viable/strict/1760696731 2025-12-04T08:57:06.5975681Z * [new tag] viable/strict/1760723515 -> viable/strict/1760723515 2025-12-04T08:57:06.5976990Z * [new tag] viable/strict/1760727234 -> viable/strict/1760727234 2025-12-04T08:57:06.5978338Z * [new tag] viable/strict/1760730578 -> viable/strict/1760730578 2025-12-04T08:57:06.5979600Z * [new tag] viable/strict/1760732726 -> viable/strict/1760732726 2025-12-04T08:57:06.5980876Z * [new tag] viable/strict/1760734180 -> viable/strict/1760734180 2025-12-04T08:57:06.5982268Z * [new tag] viable/strict/1760736251 -> viable/strict/1760736251 2025-12-04T08:57:06.5983567Z * [new tag] viable/strict/1760737772 -> viable/strict/1760737772 2025-12-04T08:57:06.5984990Z * [new tag] viable/strict/1760758005 -> viable/strict/1760758005 2025-12-04T08:57:06.5986243Z * [new tag] viable/strict/1760761532 -> viable/strict/1760761532 2025-12-04T08:57:06.5987529Z * [new tag] viable/strict/1760802581 -> viable/strict/1760802581 2025-12-04T08:57:06.5988811Z * [new tag] viable/strict/1760827772 -> viable/strict/1760827772 2025-12-04T08:57:06.5990120Z * [new tag] viable/strict/1760834524 -> viable/strict/1760834524 2025-12-04T08:57:06.5991467Z * [new tag] viable/strict/1760845009 -> viable/strict/1760845009 2025-12-04T08:57:06.5992850Z * [new tag] viable/strict/1760876836 -> viable/strict/1760876836 2025-12-04T08:57:06.5994138Z * [new tag] viable/strict/1760880329 -> viable/strict/1760880329 2025-12-04T08:57:06.5995411Z * [new tag] viable/strict/1760888987 -> viable/strict/1760888987 2025-12-04T08:57:06.5996667Z * [new tag] viable/strict/1760912664 -> viable/strict/1760912664 2025-12-04T08:57:06.5997931Z * [new tag] viable/strict/1760925321 -> viable/strict/1760925321 2025-12-04T08:57:06.5999202Z * [new tag] viable/strict/1760931488 -> viable/strict/1760931488 2025-12-04T08:57:06.6000611Z * [new tag] viable/strict/1760932693 -> viable/strict/1760932693 2025-12-04T08:57:06.6001931Z * [new tag] viable/strict/1761004184 -> viable/strict/1761004184 2025-12-04T08:57:06.6003242Z * [new tag] viable/strict/1761014748 -> viable/strict/1761014748 2025-12-04T08:57:06.6004531Z * [new tag] viable/strict/1761017491 -> viable/strict/1761017491 2025-12-04T08:57:06.6005815Z * [new tag] viable/strict/1761018806 -> viable/strict/1761018806 2025-12-04T08:57:06.6007601Z * [new tag] viable/strict/1761020754 -> viable/strict/1761020754 2025-12-04T08:57:06.6008893Z * [new tag] viable/strict/1761024303 -> viable/strict/1761024303 2025-12-04T08:57:06.6010186Z * [new tag] viable/strict/1761029582 -> viable/strict/1761029582 2025-12-04T08:57:06.6011454Z * [new tag] viable/strict/1761031535 -> viable/strict/1761031535 2025-12-04T08:57:06.6012723Z * [new tag] viable/strict/1761035196 -> viable/strict/1761035196 2025-12-04T08:57:06.6014094Z * [new tag] viable/strict/1761045825 -> viable/strict/1761045825 2025-12-04T08:57:06.6015410Z * [new tag] viable/strict/1761054796 -> viable/strict/1761054796 2025-12-04T08:57:06.6016699Z * [new tag] viable/strict/1761060314 -> viable/strict/1761060314 2025-12-04T08:57:06.6020300Z * [new tag] viable/strict/1761071198 -> viable/strict/1761071198 2025-12-04T08:57:06.6021650Z * [new tag] viable/strict/1761074628 -> viable/strict/1761074628 2025-12-04T08:57:06.6022980Z * [new tag] viable/strict/1761078351 -> viable/strict/1761078351 2025-12-04T08:57:06.6024269Z * [new tag] viable/strict/1761079822 -> viable/strict/1761079822 2025-12-04T08:57:06.6025535Z * [new tag] viable/strict/1761081873 -> viable/strict/1761081873 2025-12-04T08:57:06.6026859Z * [new tag] viable/strict/1761083392 -> viable/strict/1761083392 2025-12-04T08:57:06.6028198Z * [new tag] viable/strict/1761085465 -> viable/strict/1761085465 2025-12-04T08:57:06.6029490Z * [new tag] viable/strict/1761089099 -> viable/strict/1761089099 2025-12-04T08:57:06.6030776Z * [new tag] viable/strict/1761095535 -> viable/strict/1761095535 2025-12-04T08:57:06.6032270Z * [new tag] viable/strict/1761098119 -> viable/strict/1761098119 2025-12-04T08:57:06.6033808Z * [new tag] viable/strict/1761101330 -> viable/strict/1761101330 2025-12-04T08:57:06.6035060Z * [new tag] viable/strict/1761114425 -> viable/strict/1761114425 2025-12-04T08:57:06.6036332Z * [new tag] viable/strict/1761116036 -> viable/strict/1761116036 2025-12-04T08:57:06.6037659Z * [new tag] viable/strict/1761119379 -> viable/strict/1761119379 2025-12-04T08:57:06.6038934Z * [new tag] viable/strict/1761121601 -> viable/strict/1761121601 2025-12-04T08:57:06.6040317Z * [new tag] viable/strict/1761123234 -> viable/strict/1761123234 2025-12-04T08:57:06.6041633Z * [new tag] viable/strict/1761126621 -> viable/strict/1761126621 2025-12-04T08:57:06.6043041Z * [new tag] viable/strict/1761132259 -> viable/strict/1761132259 2025-12-04T08:57:06.6044290Z * [new tag] viable/strict/1761146746 -> viable/strict/1761146746 2025-12-04T08:57:06.6045559Z * [new tag] viable/strict/1761164752 -> viable/strict/1761164752 2025-12-04T08:57:06.6046839Z * [new tag] viable/strict/1761166198 -> viable/strict/1761166198 2025-12-04T08:57:06.6048210Z * [new tag] viable/strict/1761175424 -> viable/strict/1761175424 2025-12-04T08:57:06.6049494Z * [new tag] viable/strict/1761176983 -> viable/strict/1761176983 2025-12-04T08:57:06.6050864Z * [new tag] viable/strict/1761179891 -> viable/strict/1761179891 2025-12-04T08:57:06.6052200Z * [new tag] viable/strict/1761181930 -> viable/strict/1761181930 2025-12-04T08:57:06.6053548Z * [new tag] viable/strict/1761184516 -> viable/strict/1761184516 2025-12-04T08:57:06.6054871Z * [new tag] viable/strict/1761190179 -> viable/strict/1761190179 2025-12-04T08:57:06.6056143Z * [new tag] viable/strict/1761193558 -> viable/strict/1761193558 2025-12-04T08:57:06.6057398Z * [new tag] viable/strict/1761207990 -> viable/strict/1761207990 2025-12-04T08:57:06.6058680Z * [new tag] viable/strict/1761229539 -> viable/strict/1761229539 2025-12-04T08:57:06.6060166Z * [new tag] viable/strict/1761244031 -> viable/strict/1761244031 2025-12-04T08:57:06.6061434Z * [new tag] viable/strict/1761248986 -> viable/strict/1761248986 2025-12-04T08:57:06.6062839Z * [new tag] viable/strict/1761259791 -> viable/strict/1761259791 2025-12-04T08:57:06.6064111Z * [new tag] viable/strict/1761266139 -> viable/strict/1761266139 2025-12-04T08:57:06.6065426Z * [new tag] viable/strict/1761268316 -> viable/strict/1761268316 2025-12-04T08:57:06.6066726Z * [new tag] viable/strict/1761273805 -> viable/strict/1761273805 2025-12-04T08:57:06.6068034Z * [new tag] viable/strict/1761275261 -> viable/strict/1761275261 2025-12-04T08:57:06.6069377Z * [new tag] viable/strict/1761277913 -> viable/strict/1761277913 2025-12-04T08:57:06.6070714Z * [new tag] viable/strict/1761290701 -> viable/strict/1761290701 2025-12-04T08:57:06.6072028Z * [new tag] viable/strict/1761294396 -> viable/strict/1761294396 2025-12-04T08:57:06.6073301Z * [new tag] viable/strict/1761303047 -> viable/strict/1761303047 2025-12-04T08:57:06.6074612Z * [new tag] viable/strict/1761335388 -> viable/strict/1761335388 2025-12-04T08:57:06.6075912Z * [new tag] viable/strict/1761337551 -> viable/strict/1761337551 2025-12-04T08:57:06.6077180Z * [new tag] viable/strict/1761339007 -> viable/strict/1761339007 2025-12-04T08:57:06.6078441Z * [new tag] viable/strict/1761341050 -> viable/strict/1761341050 2025-12-04T08:57:06.6079854Z * [new tag] viable/strict/1761346188 -> viable/strict/1761346188 2025-12-04T08:57:06.6081416Z * [new tag] viable/strict/1761349792 -> viable/strict/1761349792 2025-12-04T08:57:06.6082674Z * [new tag] viable/strict/1761352620 -> viable/strict/1761352620 2025-12-04T08:57:06.6083957Z * [new tag] viable/strict/1761354730 -> viable/strict/1761354730 2025-12-04T08:57:06.6085261Z * [new tag] viable/strict/1761357298 -> viable/strict/1761357298 2025-12-04T08:57:06.6086531Z * [new tag] viable/strict/1761360201 -> viable/strict/1761360201 2025-12-04T08:57:06.6087804Z * [new tag] viable/strict/1761361753 -> viable/strict/1761361753 2025-12-04T08:57:06.6089139Z * [new tag] viable/strict/1761364351 -> viable/strict/1761364351 2025-12-04T08:57:06.6090447Z * [new tag] viable/strict/1761366338 -> viable/strict/1761366338 2025-12-04T08:57:06.6091814Z * [new tag] viable/strict/1761367802 -> viable/strict/1761367802 2025-12-04T08:57:06.6093149Z * [new tag] viable/strict/1761369889 -> viable/strict/1761369889 2025-12-04T08:57:06.6094978Z * [new tag] viable/strict/1761371385 -> viable/strict/1761371385 2025-12-04T08:57:06.6096310Z * [new tag] viable/strict/1761373581 -> viable/strict/1761373581 2025-12-04T08:57:06.6097690Z * [new tag] viable/strict/1761375054 -> viable/strict/1761375054 2025-12-04T08:57:06.6099052Z * [new tag] viable/strict/1761421785 -> viable/strict/1761421785 2025-12-04T08:57:06.6100475Z * [new tag] viable/strict/1761434614 -> viable/strict/1761434614 2025-12-04T08:57:06.6101930Z * [new tag] viable/strict/1761439254 -> viable/strict/1761439254 2025-12-04T08:57:06.6103352Z * [new tag] viable/strict/1761454187 -> viable/strict/1761454187 2025-12-04T08:57:06.6104708Z * [new tag] viable/strict/1761459991 -> viable/strict/1761459991 2025-12-04T08:57:06.6106249Z * [new tag] viable/strict/1761470668 -> viable/strict/1761470668 2025-12-04T08:57:06.6107840Z * [new tag] viable/strict/1761472188 -> viable/strict/1761472188 2025-12-04T08:57:06.6109176Z * [new tag] viable/strict/1761503178 -> viable/strict/1761503178 2025-12-04T08:57:06.6110530Z * [new tag] viable/strict/1761517492 -> viable/strict/1761517492 2025-12-04T08:57:06.6111876Z * [new tag] viable/strict/1761518981 -> viable/strict/1761518981 2025-12-04T08:57:06.6113234Z * [new tag] viable/strict/1761533609 -> viable/strict/1761533609 2025-12-04T08:57:06.6114456Z * [new tag] viable/strict/1761546438 -> viable/strict/1761546438 2025-12-04T08:57:06.6115783Z * [new tag] viable/strict/1761548133 -> viable/strict/1761548133 2025-12-04T08:57:06.6117559Z * [new tag] viable/strict/1761555186 -> viable/strict/1761555186 2025-12-04T08:57:06.6118863Z * [new tag] viable/strict/1761557178 -> viable/strict/1761557178 2025-12-04T08:57:06.6120300Z * [new tag] viable/strict/1761560772 -> viable/strict/1761560772 2025-12-04T08:57:06.6121636Z * [new tag] viable/strict/1761562266 -> viable/strict/1761562266 2025-12-04T08:57:06.6123043Z * [new tag] viable/strict/1761564260 -> viable/strict/1761564260 2025-12-04T08:57:06.6124293Z * [new tag] viable/strict/1761568072 -> viable/strict/1761568072 2025-12-04T08:57:06.6125562Z * [new tag] viable/strict/1761571683 -> viable/strict/1761571683 2025-12-04T08:57:06.6126783Z * [new tag] viable/strict/1761580199 -> viable/strict/1761580199 2025-12-04T08:57:06.6128070Z * [new tag] viable/strict/1761587383 -> viable/strict/1761587383 2025-12-04T08:57:06.6129637Z * [new tag] viable/strict/1761591165 -> viable/strict/1761591165 2025-12-04T08:57:06.6130892Z * [new tag] viable/strict/1761594575 -> viable/strict/1761594575 2025-12-04T08:57:06.6132142Z * [new tag] viable/strict/1761596710 -> viable/strict/1761596710 2025-12-04T08:57:06.6133455Z * [new tag] viable/strict/1761598189 -> viable/strict/1761598189 2025-12-04T08:57:06.6134783Z * [new tag] viable/strict/1761600254 -> viable/strict/1761600254 2025-12-04T08:57:06.6136102Z * [new tag] viable/strict/1761603879 -> viable/strict/1761603879 2025-12-04T08:57:06.6137473Z * [new tag] viable/strict/1761605429 -> viable/strict/1761605429 2025-12-04T08:57:06.6138865Z * [new tag] viable/strict/1761607468 -> viable/strict/1761607468 2025-12-04T08:57:06.6140166Z * [new tag] viable/strict/1761608983 -> viable/strict/1761608983 2025-12-04T08:57:06.6141542Z * [new tag] viable/strict/1761611846 -> viable/strict/1761611846 2025-12-04T08:57:06.6142903Z * [new tag] viable/strict/1761613922 -> viable/strict/1761613922 2025-12-04T08:57:06.6144122Z * [new tag] viable/strict/1761616504 -> viable/strict/1761616504 2025-12-04T08:57:06.6145332Z * [new tag] viable/strict/1761619599 -> viable/strict/1761619599 2025-12-04T08:57:06.6146597Z * [new tag] viable/strict/1761686693 -> viable/strict/1761686693 2025-12-04T08:57:06.6147887Z * [new tag] viable/strict/1761688179 -> viable/strict/1761688179 2025-12-04T08:57:06.6149290Z * [new tag] viable/strict/1761691973 -> viable/strict/1761691973 2025-12-04T08:57:06.6150756Z * [new tag] viable/strict/1761693884 -> viable/strict/1761693884 2025-12-04T08:57:06.6152133Z * [new tag] viable/strict/1761695389 -> viable/strict/1761695389 2025-12-04T08:57:06.6153513Z * [new tag] viable/strict/1761698408 -> viable/strict/1761698408 2025-12-04T08:57:06.6154879Z * [new tag] viable/strict/1761702931 -> viable/strict/1761702931 2025-12-04T08:57:06.6156197Z * [new tag] viable/strict/1761706307 -> viable/strict/1761706307 2025-12-04T08:57:06.6157528Z * [new tag] viable/strict/1761709065 -> viable/strict/1761709065 2025-12-04T08:57:06.6158974Z * [new tag] viable/strict/1761710285 -> viable/strict/1761710285 2025-12-04T08:57:06.6160502Z * [new tag] viable/strict/1761711983 -> viable/strict/1761711983 2025-12-04T08:57:06.6161915Z * [new tag] viable/strict/1761713514 -> viable/strict/1761713514 2025-12-04T08:57:06.6163290Z * [new tag] viable/strict/1761715523 -> viable/strict/1761715523 2025-12-04T08:57:06.6164715Z * [new tag] viable/strict/1761727973 -> viable/strict/1761727973 2025-12-04T08:57:06.6166073Z * [new tag] viable/strict/1761751558 -> viable/strict/1761751558 2025-12-04T08:57:06.6167415Z * [new tag] viable/strict/1761755187 -> viable/strict/1761755187 2025-12-04T08:57:06.6168785Z * [new tag] viable/strict/1761756826 -> viable/strict/1761756826 2025-12-04T08:57:06.6170177Z * [new tag] viable/strict/1761769551 -> viable/strict/1761769551 2025-12-04T08:57:06.6171559Z * [new tag] viable/strict/1761771032 -> viable/strict/1761771032 2025-12-04T08:57:06.6172796Z * [new tag] viable/strict/1761773101 -> viable/strict/1761773101 2025-12-04T08:57:06.6174179Z * [new tag] viable/strict/1761781792 -> viable/strict/1761781792 2025-12-04T08:57:06.6175587Z * [new tag] viable/strict/1761784788 -> viable/strict/1761784788 2025-12-04T08:57:06.6176888Z * [new tag] viable/strict/1761786740 -> viable/strict/1761786740 2025-12-04T08:57:06.6178380Z * [new tag] viable/strict/1761789332 -> viable/strict/1761789332 2025-12-04T08:57:06.6180004Z * [new tag] viable/strict/1761792569 -> viable/strict/1761792569 2025-12-04T08:57:06.6181358Z * [new tag] viable/strict/1761795289 -> viable/strict/1761795289 2025-12-04T08:57:06.6183127Z * [new tag] viable/strict/1761798345 -> viable/strict/1761798345 2025-12-04T08:57:06.6184516Z * [new tag] viable/strict/1761799827 -> viable/strict/1761799827 2025-12-04T08:57:06.6185900Z * [new tag] viable/strict/1761805604 -> viable/strict/1761805604 2025-12-04T08:57:06.6187274Z * [new tag] viable/strict/1761807202 -> viable/strict/1761807202 2025-12-04T08:57:06.6188598Z * [new tag] viable/strict/1761809094 -> viable/strict/1761809094 2025-12-04T08:57:06.6189942Z * [new tag] viable/strict/1761810576 -> viable/strict/1761810576 2025-12-04T08:57:06.6191340Z * [new tag] viable/strict/1761812771 -> viable/strict/1761812771 2025-12-04T08:57:06.6192738Z * [new tag] viable/strict/1761814363 -> viable/strict/1761814363 2025-12-04T08:57:06.6194180Z * [new tag] viable/strict/1761857410 -> viable/strict/1761857410 2025-12-04T08:57:06.6195538Z * [new tag] viable/strict/1761860985 -> viable/strict/1761860985 2025-12-04T08:57:06.6196903Z * [new tag] viable/strict/1761863094 -> viable/strict/1761863094 2025-12-04T08:57:06.6198247Z * [new tag] viable/strict/1761864590 -> viable/strict/1761864590 2025-12-04T08:57:06.6199588Z * [new tag] viable/strict/1761866675 -> viable/strict/1761866675 2025-12-04T08:57:06.6201240Z * [new tag] viable/strict/1761868178 -> viable/strict/1761868178 2025-12-04T08:57:06.6202578Z * [new tag] viable/strict/1761871111 -> viable/strict/1761871111 2025-12-04T08:57:06.6203980Z * [new tag] viable/strict/1761873126 -> viable/strict/1761873126 2025-12-04T08:57:06.6205389Z * [new tag] viable/strict/1761875714 -> viable/strict/1761875714 2025-12-04T08:57:06.6206770Z * [new tag] viable/strict/1761878924 -> viable/strict/1761878924 2025-12-04T08:57:06.6208160Z * [new tag] viable/strict/1761881727 -> viable/strict/1761881727 2025-12-04T08:57:06.6209582Z * [new tag] viable/strict/1761882959 -> viable/strict/1761882959 2025-12-04T08:57:06.6211732Z * [new tag] viable/strict/1761886268 -> viable/strict/1761886268 2025-12-04T08:57:06.6213154Z * [new tag] viable/strict/1761893641 -> viable/strict/1761893641 2025-12-04T08:57:06.6214145Z * [new tag] viable/strict/1761931517 -> viable/strict/1761931517 2025-12-04T08:57:06.6215676Z * [new tag] viable/strict/1761933080 -> viable/strict/1761933080 2025-12-04T08:57:06.6217161Z * [new tag] viable/strict/1761935217 -> viable/strict/1761935217 2025-12-04T08:57:06.6218615Z * [new tag] viable/strict/1761938533 -> viable/strict/1761938533 2025-12-04T08:57:06.6219983Z * [new tag] viable/strict/1761940184 -> viable/strict/1761940184 2025-12-04T08:57:06.6221387Z * [new tag] viable/strict/1761942338 -> viable/strict/1761942338 2025-12-04T08:57:06.6222664Z * [new tag] viable/strict/1761946100 -> viable/strict/1761946100 2025-12-04T08:57:06.6224046Z * [new tag] viable/strict/1761947374 -> viable/strict/1761947374 2025-12-04T08:57:06.6225387Z * [new tag] viable/strict/1761950978 -> viable/strict/1761950978 2025-12-04T08:57:06.6226797Z * [new tag] viable/strict/1761957727 -> viable/strict/1761957727 2025-12-04T08:57:06.6228128Z * [new tag] viable/strict/1761959532 -> viable/strict/1761959532 2025-12-04T08:57:06.6229665Z * [new tag] viable/strict/1761965366 -> viable/strict/1761965366 2025-12-04T08:57:06.6231074Z * [new tag] viable/strict/1761968066 -> viable/strict/1761968066 2025-12-04T08:57:06.6232395Z * [new tag] viable/strict/1761969322 -> viable/strict/1761969322 2025-12-04T08:57:06.6233717Z * [new tag] viable/strict/1761974723 -> viable/strict/1761974723 2025-12-04T08:57:06.6235098Z * [new tag] viable/strict/1761981837 -> viable/strict/1761981837 2025-12-04T08:57:06.6236520Z * [new tag] viable/strict/1761985546 -> viable/strict/1761985546 2025-12-04T08:57:06.6237951Z * [new tag] viable/strict/1761987030 -> viable/strict/1761987030 2025-12-04T08:57:06.6239364Z * [new tag] viable/strict/1762003554 -> viable/strict/1762003554 2025-12-04T08:57:06.6240827Z * [new tag] viable/strict/1762021560 -> viable/strict/1762021560 2025-12-04T08:57:06.6242187Z * [new tag] viable/strict/1762032190 -> viable/strict/1762032190 2025-12-04T08:57:06.6243579Z * [new tag] viable/strict/1762040981 -> viable/strict/1762040981 2025-12-04T08:57:06.6244921Z * [new tag] viable/strict/1762048525 -> viable/strict/1762048525 2025-12-04T08:57:06.6246273Z * [new tag] viable/strict/1762104223 -> viable/strict/1762104223 2025-12-04T08:57:06.6247657Z * [new tag] viable/strict/1762105778 -> viable/strict/1762105778 2025-12-04T08:57:06.6248978Z * [new tag] viable/strict/1762115109 -> viable/strict/1762115109 2025-12-04T08:57:06.6250327Z * [new tag] viable/strict/1762125840 -> viable/strict/1762125840 2025-12-04T08:57:06.6251480Z * [new tag] viable/strict/1762127377 -> viable/strict/1762127377 2025-12-04T08:57:06.6253163Z * [new tag] viable/strict/1762134925 -> viable/strict/1762134925 2025-12-04T08:57:06.6254422Z * [new tag] viable/strict/1762138338 -> viable/strict/1762138338 2025-12-04T08:57:06.6255780Z * [new tag] viable/strict/1762148993 -> viable/strict/1762148993 2025-12-04T08:57:06.6257215Z * [new tag] viable/strict/1762152871 -> viable/strict/1762152871 2025-12-04T08:57:06.6258600Z * [new tag] viable/strict/1762156183 -> viable/strict/1762156183 2025-12-04T08:57:06.6259987Z * [new tag] viable/strict/1762163457 -> viable/strict/1762163457 2025-12-04T08:57:06.6261304Z * [new tag] viable/strict/1762165569 -> viable/strict/1762165569 2025-12-04T08:57:06.6262773Z * [new tag] viable/strict/1762169035 -> viable/strict/1762169035 2025-12-04T08:57:06.6264171Z * [new tag] viable/strict/1762174936 -> viable/strict/1762174936 2025-12-04T08:57:06.6265504Z * [new tag] viable/strict/1762194412 -> viable/strict/1762194412 2025-12-04T08:57:06.6266872Z * [new tag] viable/strict/1762195876 -> viable/strict/1762195876 2025-12-04T08:57:06.6268284Z * [new tag] viable/strict/1762197788 -> viable/strict/1762197788 2025-12-04T08:57:06.6269719Z * [new tag] viable/strict/1762199389 -> viable/strict/1762199389 2025-12-04T08:57:06.6271193Z * [new tag] viable/strict/1762206585 -> viable/strict/1762206585 2025-12-04T08:57:06.6273054Z * [new tag] viable/strict/1762210184 -> viable/strict/1762210184 2025-12-04T08:57:06.6274338Z * [new tag] viable/strict/1762218736 -> viable/strict/1762218736 2025-12-04T08:57:06.6275728Z * [new tag] viable/strict/1762224529 -> viable/strict/1762224529 2025-12-04T08:57:06.6277119Z * [new tag] viable/strict/1762227253 -> viable/strict/1762227253 2025-12-04T08:57:06.6278373Z * [new tag] viable/strict/1762228515 -> viable/strict/1762228515 2025-12-04T08:57:06.6280048Z * [new tag] viable/strict/1762230349 -> viable/strict/1762230349 2025-12-04T08:57:06.6281332Z * [new tag] viable/strict/1762231859 -> viable/strict/1762231859 2025-12-04T08:57:06.6282727Z * [new tag] viable/strict/1762233925 -> viable/strict/1762233925 2025-12-04T08:57:06.6284185Z * [new tag] viable/strict/1762237630 -> viable/strict/1762237630 2025-12-04T08:57:06.6285456Z * [new tag] viable/strict/1762253522 -> viable/strict/1762253522 2025-12-04T08:57:06.6286971Z * [new tag] viable/strict/1762278588 -> viable/strict/1762278588 2025-12-04T08:57:06.6288364Z * [new tag] viable/strict/1762284203 -> viable/strict/1762284203 2025-12-04T08:57:06.6289720Z * [new tag] viable/strict/1762289446 -> viable/strict/1762289446 2025-12-04T08:57:06.6291069Z * [new tag] viable/strict/1762291515 -> viable/strict/1762291515 2025-12-04T08:57:06.6292455Z * [new tag] viable/strict/1762295100 -> viable/strict/1762295100 2025-12-04T08:57:06.6293621Z * [new tag] viable/strict/1762296590 -> viable/strict/1762296590 2025-12-04T08:57:06.6294889Z * [new tag] viable/strict/1762300179 -> viable/strict/1762300179 2025-12-04T08:57:06.6296054Z * [new tag] viable/strict/1762303207 -> viable/strict/1762303207 2025-12-04T08:57:06.6297509Z * [new tag] viable/strict/1762386584 -> viable/strict/1762386584 2025-12-04T08:57:06.6298900Z * [new tag] viable/strict/1762391537 -> viable/strict/1762391537 2025-12-04T08:57:06.6300147Z * [new tag] viable/strict/1762394119 -> viable/strict/1762394119 2025-12-04T08:57:06.6301712Z * [new tag] viable/strict/1762397437 -> viable/strict/1762397437 2025-12-04T08:57:06.6303164Z * [new tag] viable/strict/1762400256 -> viable/strict/1762400256 2025-12-04T08:57:06.6304490Z * [new tag] viable/strict/1762401469 -> viable/strict/1762401469 2025-12-04T08:57:06.6305849Z * [new tag] viable/strict/1762408195 -> viable/strict/1762408195 2025-12-04T08:57:06.6307226Z * [new tag] viable/strict/1762410411 -> viable/strict/1762410411 2025-12-04T08:57:06.6308741Z * [new tag] viable/strict/1762417613 -> viable/strict/1762417613 2025-12-04T08:57:06.6310114Z * [new tag] viable/strict/1762419198 -> viable/strict/1762419198 2025-12-04T08:57:06.6311472Z * [new tag] viable/strict/1762422656 -> viable/strict/1762422656 2025-12-04T08:57:06.6313092Z * [new tag] viable/strict/1762424746 -> viable/strict/1762424746 2025-12-04T08:57:06.6314551Z * [new tag] viable/strict/1762446386 -> viable/strict/1762446386 2025-12-04T08:57:06.6315915Z * [new tag] viable/strict/1762449912 -> viable/strict/1762449912 2025-12-04T08:57:06.6317284Z * [new tag] viable/strict/1762457031 -> viable/strict/1762457031 2025-12-04T08:57:06.6318796Z * [new tag] viable/strict/1762462441 -> viable/strict/1762462441 2025-12-04T08:57:06.6320293Z * [new tag] viable/strict/1762467909 -> viable/strict/1762467909 2025-12-04T08:57:06.6321677Z * [new tag] viable/strict/1762471493 -> viable/strict/1762471493 2025-12-04T08:57:06.6323054Z * [new tag] viable/strict/1762475990 -> viable/strict/1762475990 2025-12-04T08:57:06.6324497Z * [new tag] viable/strict/1762477933 -> viable/strict/1762477933 2025-12-04T08:57:06.6325949Z * [new tag] viable/strict/1762491053 -> viable/strict/1762491053 2025-12-04T08:57:06.6327284Z * [new tag] viable/strict/1762493118 -> viable/strict/1762493118 2025-12-04T08:57:06.6328692Z * [new tag] viable/strict/1762498442 -> viable/strict/1762498442 2025-12-04T08:57:06.6330201Z * [new tag] viable/strict/1762501778 -> viable/strict/1762501778 2025-12-04T08:57:06.6331534Z * [new tag] viable/strict/1762504001 -> viable/strict/1762504001 2025-12-04T08:57:06.6332922Z * [new tag] viable/strict/1762505583 -> viable/strict/1762505583 2025-12-04T08:57:06.6334324Z * [new tag] viable/strict/1762507523 -> viable/strict/1762507523 2025-12-04T08:57:06.6335751Z * [new tag] viable/strict/1762511140 -> viable/strict/1762511140 2025-12-04T08:57:06.6337194Z * [new tag] viable/strict/1762512632 -> viable/strict/1762512632 2025-12-04T08:57:06.6338586Z * [new tag] viable/strict/1762520467 -> viable/strict/1762520467 2025-12-04T08:57:06.6339972Z * [new tag] viable/strict/1762522016 -> viable/strict/1762522016 2025-12-04T08:57:06.6341365Z * [new tag] viable/strict/1762530591 -> viable/strict/1762530591 2025-12-04T08:57:06.6342743Z * [new tag] viable/strict/1762543405 -> viable/strict/1762543405 2025-12-04T08:57:06.6343986Z * [new tag] viable/strict/1762544998 -> viable/strict/1762544998 2025-12-04T08:57:06.6345324Z * [new tag] viable/strict/1762552182 -> viable/strict/1762552182 2025-12-04T08:57:06.6346686Z * [new tag] viable/strict/1762554297 -> viable/strict/1762554297 2025-12-04T08:57:06.6347845Z * [new tag] viable/strict/1762559381 -> viable/strict/1762559381 2025-12-04T08:57:06.6349268Z * [new tag] viable/strict/1762562222 -> viable/strict/1762562222 2025-12-04T08:57:06.6350680Z * [new tag] viable/strict/1762564319 -> viable/strict/1762564319 2025-12-04T08:57:06.6351951Z * [new tag] viable/strict/1762566904 -> viable/strict/1762566904 2025-12-04T08:57:06.6353272Z * [new tag] viable/strict/1762569781 -> viable/strict/1762569781 2025-12-04T08:57:06.6354617Z * [new tag] viable/strict/1762575940 -> viable/strict/1762575940 2025-12-04T08:57:06.6355975Z * [new tag] viable/strict/1762580974 -> viable/strict/1762580974 2025-12-04T08:57:06.6357369Z * [new tag] viable/strict/1762583185 -> viable/strict/1762583185 2025-12-04T08:57:06.6358708Z * [new tag] viable/strict/1762586647 -> viable/strict/1762586647 2025-12-04T08:57:06.6360218Z * [new tag] viable/strict/1762588183 -> viable/strict/1762588183 2025-12-04T08:57:06.6361989Z * [new tag] viable/strict/1762593886 -> viable/strict/1762593886 2025-12-04T08:57:06.6363413Z * [new tag] viable/strict/1762650743 -> viable/strict/1762650743 2025-12-04T08:57:06.6364801Z * [new tag] viable/strict/1762653328 -> viable/strict/1762653328 2025-12-04T08:57:06.6366170Z * [new tag] viable/strict/1762659342 -> viable/strict/1762659342 2025-12-04T08:57:06.6367547Z * [new tag] viable/strict/1762662360 -> viable/strict/1762662360 2025-12-04T08:57:06.6368965Z * [new tag] viable/strict/1762667377 -> viable/strict/1762667377 2025-12-04T08:57:06.6370310Z * [new tag] viable/strict/1762671090 -> viable/strict/1762671090 2025-12-04T08:57:06.6371655Z * [new tag] viable/strict/1762680284 -> viable/strict/1762680284 2025-12-04T08:57:06.6373084Z * [new tag] viable/strict/1762683900 -> viable/strict/1762683900 2025-12-04T08:57:06.6374471Z * [new tag] viable/strict/1762705541 -> viable/strict/1762705541 2025-12-04T08:57:06.6375816Z * [new tag] viable/strict/1762709004 -> viable/strict/1762709004 2025-12-04T08:57:06.6377186Z * [new tag] viable/strict/1762746004 -> viable/strict/1762746004 2025-12-04T08:57:06.6378633Z * [new tag] viable/strict/1762748799 -> viable/strict/1762748799 2025-12-04T08:57:06.6380170Z * [new tag] viable/strict/1762759504 -> viable/strict/1762759504 2025-12-04T08:57:06.6381590Z * [new tag] viable/strict/1762760973 -> viable/strict/1762760973 2025-12-04T08:57:06.6382906Z * [new tag] viable/strict/1762775374 -> viable/strict/1762775374 2025-12-04T08:57:06.6384268Z * [new tag] viable/strict/1762777661 -> viable/strict/1762777661 2025-12-04T08:57:06.6385645Z * [new tag] viable/strict/1762779774 -> viable/strict/1762779774 2025-12-04T08:57:06.6387182Z * [new tag] viable/strict/1762781259 -> viable/strict/1762781259 2025-12-04T08:57:06.6388602Z * [new tag] viable/strict/1762793628 -> viable/strict/1762793628 2025-12-04T08:57:06.6389999Z * [new tag] viable/strict/1762800711 -> viable/strict/1762800711 2025-12-04T08:57:06.6391391Z * [new tag] viable/strict/1762809894 -> viable/strict/1762809894 2025-12-04T08:57:06.6392704Z * [new tag] viable/strict/1762811384 -> viable/strict/1762811384 2025-12-04T08:57:06.6394146Z * [new tag] viable/strict/1762813841 -> viable/strict/1762813841 2025-12-04T08:57:06.6395527Z * [new tag] viable/strict/1762815047 -> viable/strict/1762815047 2025-12-04T08:57:06.6397022Z * [new tag] viable/strict/1762817094 -> viable/strict/1762817094 2025-12-04T08:57:06.6398430Z * [new tag] viable/strict/1762818582 -> viable/strict/1762818582 2025-12-04T08:57:06.6399806Z * [new tag] viable/strict/1762821623 -> viable/strict/1762821623 2025-12-04T08:57:06.6401188Z * [new tag] viable/strict/1762823531 -> viable/strict/1762823531 2025-12-04T08:57:06.6402631Z * [new tag] viable/strict/1762849583 -> viable/strict/1762849583 2025-12-04T08:57:06.6403956Z * [new tag] viable/strict/1762851200 -> viable/strict/1762851200 2025-12-04T08:57:06.6405296Z * [new tag] viable/strict/1762854603 -> viable/strict/1762854603 2025-12-04T08:57:06.6406689Z * [new tag] viable/strict/1762858276 -> viable/strict/1762858276 2025-12-04T08:57:06.6408124Z * [new tag] viable/strict/1762860891 -> viable/strict/1762860891 2025-12-04T08:57:06.6409846Z * [new tag] viable/strict/1762866174 -> viable/strict/1762866174 2025-12-04T08:57:06.6411308Z * [new tag] viable/strict/1762867653 -> viable/strict/1762867653 2025-12-04T08:57:06.6412687Z * [new tag] viable/strict/1762872669 -> viable/strict/1762872669 2025-12-04T08:57:06.6413968Z * [new tag] viable/strict/1762878380 -> viable/strict/1762878380 2025-12-04T08:57:06.6415308Z * [new tag] viable/strict/1762889003 -> viable/strict/1762889003 2025-12-04T08:57:06.6416723Z * [new tag] viable/strict/1762890589 -> viable/strict/1762890589 2025-12-04T08:57:06.6420349Z * [new tag] viable/strict/1762892743 -> viable/strict/1762892743 2025-12-04T08:57:06.6421798Z * [new tag] viable/strict/1762894271 -> viable/strict/1762894271 2025-12-04T08:57:06.6422993Z * [new tag] viable/strict/1762896287 -> viable/strict/1762896287 2025-12-04T08:57:06.6424358Z * [new tag] viable/strict/1762915871 -> viable/strict/1762915871 2025-12-04T08:57:06.6425735Z * [new tag] viable/strict/1762918569 -> viable/strict/1762918569 2025-12-04T08:57:06.6427015Z * [new tag] viable/strict/1762919776 -> viable/strict/1762919776 2025-12-04T08:57:06.6428392Z * [new tag] viable/strict/1762923072 -> viable/strict/1762923072 2025-12-04T08:57:06.6429776Z * [new tag] viable/strict/1762928826 -> viable/strict/1762928826 2025-12-04T08:57:06.6431345Z * [new tag] viable/strict/1762930451 -> viable/strict/1762930451 2025-12-04T08:57:06.6432923Z * [new tag] viable/strict/1762933780 -> viable/strict/1762933780 2025-12-04T08:57:06.6434177Z * [new tag] viable/strict/1762937638 -> viable/strict/1762937638 2025-12-04T08:57:06.6435679Z * [new tag] viable/strict/1762939545 -> viable/strict/1762939545 2025-12-04T08:57:06.6437075Z * [new tag] viable/strict/1762962692 -> viable/strict/1762962692 2025-12-04T08:57:06.6438466Z * [new tag] viable/strict/1762979143 -> viable/strict/1762979143 2025-12-04T08:57:06.6439879Z * [new tag] viable/strict/1762984188 -> viable/strict/1762984188 2025-12-04T08:57:06.6441258Z * [new tag] viable/strict/1762986306 -> viable/strict/1762986306 2025-12-04T08:57:06.6442640Z * [new tag] viable/strict/1762989903 -> viable/strict/1762989903 2025-12-04T08:57:06.6444002Z * [new tag] viable/strict/1762991377 -> viable/strict/1762991377 2025-12-04T08:57:06.6445361Z * [new tag] viable/strict/1762998921 -> viable/strict/1762998921 2025-12-04T08:57:06.6446755Z * [new tag] viable/strict/1763002287 -> viable/strict/1763002287 2025-12-04T08:57:06.6448194Z * [new tag] viable/strict/1763016840 -> viable/strict/1763016840 2025-12-04T08:57:06.6449550Z * [new tag] viable/strict/1763020180 -> viable/strict/1763020180 2025-12-04T08:57:06.6450926Z * [new tag] viable/strict/1763027421 -> viable/strict/1763027421 2025-12-04T08:57:06.6452284Z * [new tag] viable/strict/1763031120 -> viable/strict/1763031120 2025-12-04T08:57:06.6454071Z * [new tag] viable/strict/1763036861 -> viable/strict/1763036861 2025-12-04T08:57:06.6455496Z * [new tag] viable/strict/1763038993 -> viable/strict/1763038993 2025-12-04T08:57:06.6456886Z * [new tag] viable/strict/1763054703 -> viable/strict/1763054703 2025-12-04T08:57:06.6458143Z * [new tag] viable/strict/1763067061 -> viable/strict/1763067061 2025-12-04T08:57:06.6459525Z * [new tag] viable/strict/1763070847 -> viable/strict/1763070847 2025-12-04T08:57:06.6460890Z * [new tag] viable/strict/1763072706 -> viable/strict/1763072706 2025-12-04T08:57:06.6462330Z * [new tag] viable/strict/1763076302 -> viable/strict/1763076302 2025-12-04T08:57:06.6463691Z * [new tag] viable/strict/1763080816 -> viable/strict/1763080816 2025-12-04T08:57:06.6465166Z * [new tag] viable/strict/1763082732 -> viable/strict/1763082732 2025-12-04T08:57:06.6466541Z * [new tag] viable/strict/1763085329 -> viable/strict/1763085329 2025-12-04T08:57:06.6467929Z * [new tag] viable/strict/1763088623 -> viable/strict/1763088623 2025-12-04T08:57:06.6469391Z * [new tag] viable/strict/1763091402 -> viable/strict/1763091402 2025-12-04T08:57:06.6470801Z * [new tag] viable/strict/1763092602 -> viable/strict/1763092602 2025-12-04T08:57:06.6472156Z * [new tag] viable/strict/1763094355 -> viable/strict/1763094355 2025-12-04T08:57:06.6473513Z * [new tag] viable/strict/1763099390 -> viable/strict/1763099390 2025-12-04T08:57:06.6474882Z * [new tag] viable/strict/1763101608 -> viable/strict/1763101608 2025-12-04T08:57:06.6476333Z * [new tag] viable/strict/1763105102 -> viable/strict/1763105102 2025-12-04T08:57:06.6477723Z * [new tag] viable/strict/1763112347 -> viable/strict/1763112347 2025-12-04T08:57:06.6479076Z * [new tag] viable/strict/1763119471 -> viable/strict/1763119471 2025-12-04T08:57:06.6480392Z * [new tag] viable/strict/1763126835 -> viable/strict/1763126835 2025-12-04T08:57:06.6481670Z * [new tag] viable/strict/1763149779 -> viable/strict/1763149779 2025-12-04T08:57:06.6483127Z * [new tag] viable/strict/1763164178 -> viable/strict/1763164178 2025-12-04T08:57:06.6484393Z * [new tag] viable/strict/1763167104 -> viable/strict/1763167104 2025-12-04T08:57:06.6485742Z * [new tag] viable/strict/1763169132 -> viable/strict/1763169132 2025-12-04T08:57:06.6487135Z * [new tag] viable/strict/1763171708 -> viable/strict/1763171708 2025-12-04T08:57:06.6488444Z * [new tag] viable/strict/1763174759 -> viable/strict/1763174759 2025-12-04T08:57:06.6489812Z * [new tag] viable/strict/1763180744 -> viable/strict/1763180744 2025-12-04T08:57:06.6491160Z * [new tag] viable/strict/1763182227 -> viable/strict/1763182227 2025-12-04T08:57:06.6492551Z * [new tag] viable/strict/1763184309 -> viable/strict/1763184309 2025-12-04T08:57:06.6494249Z * [new tag] viable/strict/1763187991 -> viable/strict/1763187991 2025-12-04T08:57:06.6495652Z * [new tag] viable/strict/1763191445 -> viable/strict/1763191445 2025-12-04T08:57:06.6497158Z * [new tag] viable/strict/1763195152 -> viable/strict/1763195152 2025-12-04T08:57:06.6498522Z * [new tag] viable/strict/1763205769 -> viable/strict/1763205769 2025-12-04T08:57:06.6499828Z * [new tag] viable/strict/1763246990 -> viable/strict/1763246990 2025-12-04T08:57:06.6501225Z * [new tag] viable/strict/1763261578 -> viable/strict/1763261578 2025-12-04T08:57:06.6502515Z * [new tag] viable/strict/1763286573 -> viable/strict/1763286573 2025-12-04T08:57:06.6503802Z * [new tag] viable/strict/1763292167 -> viable/strict/1763292167 2025-12-04T08:57:06.6505167Z * [new tag] viable/strict/1763333386 -> viable/strict/1763333386 2025-12-04T08:57:06.6506515Z * [new tag] viable/strict/1763340082 -> viable/strict/1763340082 2025-12-04T08:57:06.6508515Z * [new tag] viable/strict/1763364324 -> viable/strict/1763364324 2025-12-04T08:57:06.6510004Z * [new tag] viable/strict/1763371569 -> viable/strict/1763371569 2025-12-04T08:57:06.6511357Z * [new tag] viable/strict/1763373067 -> viable/strict/1763373067 2025-12-04T08:57:06.6512675Z * [new tag] viable/strict/1763375157 -> viable/strict/1763375157 2025-12-04T08:57:06.6514041Z * [new tag] viable/strict/1763382462 -> viable/strict/1763382462 2025-12-04T08:57:06.6515492Z * [new tag] viable/strict/1763394661 -> viable/strict/1763394661 2025-12-04T08:57:06.6516958Z * [new tag] viable/strict/1763396797 -> viable/strict/1763396797 2025-12-04T08:57:06.6518633Z * [new tag] viable/strict/1763398542 -> viable/strict/1763398542 2025-12-04T08:57:06.6520113Z * [new tag] viable/strict/1763401807 -> viable/strict/1763401807 2025-12-04T08:57:06.6521389Z * [new tag] viable/strict/1763414698 -> viable/strict/1763414698 2025-12-04T08:57:06.6522808Z * [new tag] viable/strict/1763419807 -> viable/strict/1763419807 2025-12-04T08:57:06.6524177Z * [new tag] viable/strict/1763426369 -> viable/strict/1763426369 2025-12-04T08:57:06.6525579Z * [new tag] viable/strict/1763428331 -> viable/strict/1763428331 2025-12-04T08:57:06.6526980Z * [new tag] viable/strict/1763430922 -> viable/strict/1763430922 2025-12-04T08:57:06.6528233Z * [new tag] viable/strict/1763434184 -> viable/strict/1763434184 2025-12-04T08:57:06.6529612Z * [new tag] viable/strict/1763439973 -> viable/strict/1763439973 2025-12-04T08:57:06.6530993Z * [new tag] viable/strict/1763444995 -> viable/strict/1763444995 2025-12-04T08:57:06.6532499Z * [new tag] viable/strict/1763447206 -> viable/strict/1763447206 2025-12-04T08:57:06.6533814Z * [new tag] viable/strict/1763448826 -> viable/strict/1763448826 2025-12-04T08:57:06.6535135Z * [new tag] viable/strict/1763450717 -> viable/strict/1763450717 2025-12-04T08:57:06.6536517Z * [new tag] viable/strict/1763452183 -> viable/strict/1763452183 2025-12-04T08:57:06.6537902Z * [new tag] viable/strict/1763457945 -> viable/strict/1763457945 2025-12-04T08:57:06.6539246Z * [new tag] viable/strict/1763459439 -> viable/strict/1763459439 2025-12-04T08:57:06.6540509Z * [new tag] viable/strict/1763461556 -> viable/strict/1763461556 2025-12-04T08:57:06.6541884Z * [new tag] viable/strict/1763463103 -> viable/strict/1763463103 2025-12-04T08:57:06.6543650Z * [new tag] viable/strict/1763465100 -> viable/strict/1763465100 2025-12-04T08:57:06.6544961Z * [new tag] viable/strict/1763468866 -> viable/strict/1763468866 2025-12-04T08:57:06.6546212Z * [new tag] viable/strict/1763493823 -> viable/strict/1763493823 2025-12-04T08:57:06.6547482Z * [new tag] viable/strict/1763496249 -> viable/strict/1763496249 2025-12-04T08:57:06.6548931Z * [new tag] viable/strict/1763502620 -> viable/strict/1763502620 2025-12-04T08:57:06.6550335Z * [new tag] viable/strict/1763504715 -> viable/strict/1763504715 2025-12-04T08:57:06.6551696Z * [new tag] viable/strict/1763506208 -> viable/strict/1763506208 2025-12-04T08:57:06.6553122Z * [new tag] viable/strict/1763520590 -> viable/strict/1763520590 2025-12-04T08:57:06.6554597Z * [new tag] viable/strict/1763523357 -> viable/strict/1763523357 2025-12-04T08:57:06.6556004Z * [new tag] viable/strict/1763529922 -> viable/strict/1763529922 2025-12-04T08:57:06.6557422Z * [new tag] viable/strict/1763531408 -> viable/strict/1763531408 2025-12-04T08:57:06.6558848Z * [new tag] viable/strict/1763533622 -> viable/strict/1763533622 2025-12-04T08:57:06.6560271Z * [new tag] viable/strict/1763538576 -> viable/strict/1763538576 2025-12-04T08:57:06.6561696Z * [new tag] viable/strict/1763545823 -> viable/strict/1763545823 2025-12-04T08:57:06.6562850Z * [new tag] viable/strict/1763547951 -> viable/strict/1763547951 2025-12-04T08:57:06.6564380Z * [new tag] viable/strict/1763551477 -> viable/strict/1763551477 2025-12-04T08:57:06.6565735Z * [new tag] viable/strict/1763552982 -> viable/strict/1763552982 2025-12-04T08:57:06.6567100Z * [new tag] viable/strict/1763594698 -> viable/strict/1763594698 2025-12-04T08:57:06.6568499Z * [new tag] viable/strict/1763596178 -> viable/strict/1763596178 2025-12-04T08:57:06.6569925Z * [new tag] viable/strict/1763599155 -> viable/strict/1763599155 2025-12-04T08:57:06.6571243Z * [new tag] viable/strict/1763603717 -> viable/strict/1763603717 2025-12-04T08:57:06.6572636Z * [new tag] viable/strict/1763606923 -> viable/strict/1763606923 2025-12-04T08:57:06.6574001Z * [new tag] viable/strict/1763609715 -> viable/strict/1763609715 2025-12-04T08:57:06.6575403Z * [new tag] viable/strict/1763612757 -> viable/strict/1763612757 2025-12-04T08:57:06.6576741Z * [new tag] viable/strict/1763616325 -> viable/strict/1763616325 2025-12-04T08:57:06.6578182Z * [new tag] viable/strict/1763623509 -> viable/strict/1763623509 2025-12-04T08:57:06.6580106Z * [new tag] viable/strict/1763624984 -> viable/strict/1763624984 2025-12-04T08:57:06.6580981Z * [new tag] viable/strict/1763628796 -> viable/strict/1763628796 2025-12-04T08:57:06.6582484Z * [new tag] viable/strict/1763634343 -> viable/strict/1763634343 2025-12-04T08:57:06.6583775Z * [new tag] viable/strict/1763635867 -> viable/strict/1763635867 2025-12-04T08:57:06.6585406Z * [new tag] viable/strict/1763639382 -> viable/strict/1763639382 2025-12-04T08:57:06.6586656Z * [new tag] viable/strict/1763646626 -> viable/strict/1763646626 2025-12-04T08:57:06.6588130Z * [new tag] viable/strict/1763655997 -> viable/strict/1763655997 2025-12-04T08:57:06.6589497Z * [new tag] viable/strict/1763659444 -> viable/strict/1763659444 2025-12-04T08:57:06.6590898Z * [new tag] viable/strict/1763660992 -> viable/strict/1763660992 2025-12-04T08:57:06.6592234Z * [new tag] viable/strict/1763663201 -> viable/strict/1763663201 2025-12-04T08:57:06.6593609Z * [new tag] viable/strict/1763670362 -> viable/strict/1763670362 2025-12-04T08:57:06.6594873Z * [new tag] viable/strict/1763675378 -> viable/strict/1763675378 2025-12-04T08:57:06.6596294Z * [new tag] viable/strict/1763693343 -> viable/strict/1763693343 2025-12-04T08:57:06.6597635Z * [new tag] viable/strict/1763696088 -> viable/strict/1763696088 2025-12-04T08:57:06.6599128Z * [new tag] viable/strict/1763697343 -> viable/strict/1763697343 2025-12-04T08:57:06.6600632Z * [new tag] viable/strict/1763699165 -> viable/strict/1763699165 2025-12-04T08:57:06.6602049Z * [new tag] viable/strict/1763700660 -> viable/strict/1763700660 2025-12-04T08:57:06.6603408Z * [new tag] viable/strict/1763704209 -> viable/strict/1763704209 2025-12-04T08:57:06.6604778Z * [new tag] viable/strict/1763706411 -> viable/strict/1763706411 2025-12-04T08:57:06.6606116Z * [new tag] viable/strict/1763708082 -> viable/strict/1763708082 2025-12-04T08:57:06.6607446Z * [new tag] viable/strict/1763711381 -> viable/strict/1763711381 2025-12-04T08:57:06.6608727Z * [new tag] viable/strict/1763713593 -> viable/strict/1763713593 2025-12-04T08:57:06.6610120Z * [new tag] viable/strict/1763715201 -> viable/strict/1763715201 2025-12-04T08:57:06.6611465Z * [new tag] viable/strict/1763733017 -> viable/strict/1763733017 2025-12-04T08:57:06.6612860Z * [new tag] viable/strict/1763735108 -> viable/strict/1763735108 2025-12-04T08:57:06.6614257Z * [new tag] viable/strict/1763749579 -> viable/strict/1763749579 2025-12-04T08:57:06.6615620Z * [new tag] viable/strict/1763751113 -> viable/strict/1763751113 2025-12-04T08:57:06.6617133Z * [new tag] viable/strict/1763753035 -> viable/strict/1763753035 2025-12-04T08:57:06.6618586Z * [new tag] viable/strict/1763754578 -> viable/strict/1763754578 2025-12-04T08:57:06.6620014Z * [new tag] viable/strict/1763756748 -> viable/strict/1763756748 2025-12-04T08:57:06.6621396Z * [new tag] viable/strict/1763758205 -> viable/strict/1763758205 2025-12-04T08:57:06.6622879Z * [new tag] viable/strict/1763764050 -> viable/strict/1763764050 2025-12-04T08:57:06.6624374Z * [new tag] viable/strict/1763771887 -> viable/strict/1763771887 2025-12-04T08:57:06.6625870Z * [new tag] viable/strict/1763773920 -> viable/strict/1763773920 2025-12-04T08:57:06.6627231Z * [new tag] viable/strict/1763776501 -> viable/strict/1763776501 2025-12-04T08:57:06.6628574Z * [new tag] viable/strict/1763779437 -> viable/strict/1763779437 2025-12-04T08:57:06.6630109Z * [new tag] viable/strict/1763781038 -> viable/strict/1763781038 2025-12-04T08:57:06.6631395Z * [new tag] viable/strict/1763782245 -> viable/strict/1763782245 2025-12-04T08:57:06.6633410Z * [new tag] viable/strict/1763785568 -> viable/strict/1763785568 2025-12-04T08:57:06.6634841Z * [new tag] viable/strict/1763787006 -> viable/strict/1763787006 2025-12-04T08:57:06.6636230Z * [new tag] viable/strict/1763789103 -> viable/strict/1763789103 2025-12-04T08:57:06.6637566Z * [new tag] viable/strict/1763790578 -> viable/strict/1763790578 2025-12-04T08:57:06.6638905Z * [new tag] viable/strict/1763796275 -> viable/strict/1763796275 2025-12-04T08:57:06.6640557Z * [new tag] viable/strict/1763801465 -> viable/strict/1763801465 2025-12-04T08:57:06.6641993Z * [new tag] viable/strict/1763803522 -> viable/strict/1763803522 2025-12-04T08:57:06.6643288Z * [new tag] viable/strict/1763808581 -> viable/strict/1763808581 2025-12-04T08:57:06.6644740Z * [new tag] viable/strict/1763840977 -> viable/strict/1763840977 2025-12-04T08:57:06.6646161Z * [new tag] viable/strict/1763846659 -> viable/strict/1763846659 2025-12-04T08:57:06.6647598Z * [new tag] viable/strict/1763872065 -> viable/strict/1763872065 2025-12-04T08:57:06.6648955Z * [new tag] viable/strict/1763873648 -> viable/strict/1763873648 2025-12-04T08:57:06.6650345Z * [new tag] viable/strict/1763875506 -> viable/strict/1763875506 2025-12-04T08:57:06.6651599Z * [new tag] viable/strict/1763889904 -> viable/strict/1763889904 2025-12-04T08:57:06.6653025Z * [new tag] viable/strict/1763930999 -> viable/strict/1763930999 2025-12-04T08:57:06.6654380Z * [new tag] viable/strict/1763944964 -> viable/strict/1763944964 2025-12-04T08:57:06.6655525Z * [new tag] viable/strict/1763958474 -> viable/strict/1763958474 2025-12-04T08:57:06.6656940Z * [new tag] viable/strict/1763967263 -> viable/strict/1763967263 2025-12-04T08:57:06.6658381Z * [new tag] viable/strict/1763972803 -> viable/strict/1763972803 2025-12-04T08:57:06.6659714Z * [new tag] viable/strict/1763976376 -> viable/strict/1763976376 2025-12-04T08:57:06.6661103Z * [new tag] viable/strict/1763989404 -> viable/strict/1763989404 2025-12-04T08:57:06.6662445Z * [new tag] viable/strict/1763990887 -> viable/strict/1763990887 2025-12-04T08:57:06.6663865Z * [new tag] viable/strict/1764019919 -> viable/strict/1764019919 2025-12-04T08:57:06.6665230Z * [new tag] viable/strict/1764023134 -> viable/strict/1764023134 2025-12-04T08:57:06.6666466Z * [new tag] viable/strict/1764024593 -> viable/strict/1764024593 2025-12-04T08:57:06.6667840Z * [new tag] viable/strict/1764026706 -> viable/strict/1764026706 2025-12-04T08:57:06.6669335Z * [new tag] viable/strict/1764031139 -> viable/strict/1764031139 2025-12-04T08:57:06.6670744Z * [new tag] viable/strict/1764033131 -> viable/strict/1764033131 2025-12-04T08:57:06.6672076Z * [new tag] viable/strict/1764035725 -> viable/strict/1764035725 2025-12-04T08:57:06.6673341Z * [new tag] viable/strict/1764624265 -> viable/strict/1764624265 2025-12-04T08:57:06.6674498Z * [new tag] viable/strict/1764631514 -> viable/strict/1764631514 2025-12-04T08:57:06.6675794Z * [new tag] viable/strict/1764632987 -> viable/strict/1764632987 2025-12-04T08:57:06.6676943Z * [new tag] viable/strict/1764636063 -> viable/strict/1764636063 2025-12-04T08:57:06.6678255Z * [new tag] viable/strict/1764643975 -> viable/strict/1764643975 2025-12-04T08:57:06.6679366Z * [new tag] viable/strict/1764646859 -> viable/strict/1764646859 2025-12-04T08:57:06.6680773Z * [new tag] viable/strict/1764653120 -> viable/strict/1764653120 2025-12-04T08:57:06.6682150Z * [new tag] viable/strict/1764654632 -> viable/strict/1764654632 2025-12-04T08:57:06.6683175Z * [new tag] viable/strict/1764656821 -> viable/strict/1764656821 2025-12-04T08:57:06.6684514Z * [new tag] viable/strict/1764658557 -> viable/strict/1764658557 2025-12-04T08:57:06.6685670Z * [new tag] viable/strict/1764660333 -> viable/strict/1764660333 2025-12-04T08:57:06.6687035Z * [new tag] viable/strict/1764661812 -> viable/strict/1764661812 2025-12-04T08:57:06.6688238Z * [new tag] viable/strict/1764664023 -> viable/strict/1764664023 2025-12-04T08:57:06.6689485Z * [new tag] viable/strict/1764669150 -> viable/strict/1764669150 2025-12-04T08:57:06.6690681Z * [new tag] viable/strict/1764680709 -> viable/strict/1764680709 2025-12-04T08:57:06.6691929Z * [new tag] viable/strict/1764687619 -> viable/strict/1764687619 2025-12-04T08:57:06.6693197Z * [new tag] viable/strict/1764696355 -> viable/strict/1764696355 2025-12-04T08:57:06.6694388Z * [new tag] viable/strict/1764701767 -> viable/strict/1764701767 2025-12-04T08:57:06.6695639Z * [new tag] viable/strict/1764710768 -> viable/strict/1764710768 2025-12-04T08:57:06.6696800Z * [new tag] viable/strict/1764716202 -> viable/strict/1764716202 2025-12-04T08:57:06.6698082Z * [new tag] viable/strict/1764793566 -> viable/strict/1764793566 2025-12-04T08:57:06.6699228Z * [new tag] viable/strict/1764797093 -> viable/strict/1764797093 2025-12-04T08:57:06.6700535Z * [new tag] viable/strict/1764800729 -> viable/strict/1764800729 2025-12-04T08:57:06.6701851Z * [new tag] whc_flight_1 -> whc_flight_1 2025-12-04T08:57:06.6703222Z * [new tag] whc_flight_2 -> whc_flight_2 2025-12-04T08:57:06.6704648Z * [new tag] whc_flight_4 -> whc_flight_4 2025-12-04T08:57:06.7549211Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T08:57:06.7576907Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:06.7581561Z ##[endgroup] 2025-12-04T08:57:06.7581917Z ##[group]Determining the checkout info 2025-12-04T08:57:06.7582992Z ##[endgroup] 2025-12-04T08:57:06.7587478Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T08:57:06.7624360Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T08:57:06.7650744Z ##[group]Checking out the ref 2025-12-04T08:57:06.7654711Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:07.8195896Z Updating files: 75% (15153/20121) 2025-12-04T08:57:07.8353066Z Updating files: 76% (15292/20121) 2025-12-04T08:57:07.8503864Z Updating files: 77% (15494/20121) 2025-12-04T08:57:07.8714854Z Updating files: 78% (15695/20121) 2025-12-04T08:57:07.8973352Z Updating files: 79% (15896/20121) 2025-12-04T08:57:07.9280649Z Updating files: 80% (16097/20121) 2025-12-04T08:57:07.9560573Z Updating files: 81% (16299/20121) 2025-12-04T08:57:07.9783983Z Updating files: 82% (16500/20121) 2025-12-04T08:57:07.9952536Z Updating files: 83% (16701/20121) 2025-12-04T08:57:08.0107869Z Updating files: 84% (16902/20121) 2025-12-04T08:57:08.0284937Z Updating files: 85% (17103/20121) 2025-12-04T08:57:08.0456387Z Updating files: 86% (17305/20121) 2025-12-04T08:57:08.0615676Z Updating files: 87% (17506/20121) 2025-12-04T08:57:08.0748301Z Updating files: 88% (17707/20121) 2025-12-04T08:57:08.0900816Z Updating files: 89% (17908/20121) 2025-12-04T08:57:08.1085251Z Updating files: 90% (18109/20121) 2025-12-04T08:57:08.1220953Z Updating files: 91% (18311/20121) 2025-12-04T08:57:08.1388361Z Updating files: 92% (18512/20121) 2025-12-04T08:57:08.1581249Z Updating files: 93% (18713/20121) 2025-12-04T08:57:08.1790415Z Updating files: 94% (18914/20121) 2025-12-04T08:57:08.1975645Z Updating files: 95% (19115/20121) 2025-12-04T08:57:08.2150258Z Updating files: 96% (19317/20121) 2025-12-04T08:57:08.2325808Z Updating files: 97% (19518/20121) 2025-12-04T08:57:08.2604839Z Updating files: 98% (19719/20121) 2025-12-04T08:57:08.2790512Z Updating files: 99% (19920/20121) 2025-12-04T08:57:08.2790825Z Updating files: 100% (20121/20121) 2025-12-04T08:57:08.2791100Z Updating files: 100% (20121/20121), done. 2025-12-04T08:57:08.3023309Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'. 2025-12-04T08:57:08.3023625Z 2025-12-04T08:57:08.3023839Z You are in 'detached HEAD' state. You can look around, make experimental 2025-12-04T08:57:08.3024347Z changes and commit them, and you can discard any commits you make in this 2025-12-04T08:57:08.3024842Z state without impacting any branches by switching back to a branch. 2025-12-04T08:57:08.3025126Z 2025-12-04T08:57:08.3025317Z If you want to create a new branch to retain commits you create, you may 2025-12-04T08:57:08.3025800Z do so (now or later) by using -c with the switch command. Example: 2025-12-04T08:57:08.3026067Z 2025-12-04T08:57:08.3026188Z git switch -c 2025-12-04T08:57:08.3026376Z 2025-12-04T08:57:08.3026475Z Or undo this operation with: 2025-12-04T08:57:08.3026637Z 2025-12-04T08:57:08.3026725Z git switch - 2025-12-04T08:57:08.3026848Z 2025-12-04T08:57:08.3027061Z Turn off this advice by setting config variable advice.detachedHead to false 2025-12-04T08:57:08.3027374Z 2025-12-04T08:57:08.3027615Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:57:08.3148174Z ##[endgroup] 2025-12-04T08:57:08.3148852Z ##[group]Setting up auth for fetching submodules 2025-12-04T08:57:08.3155388Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:57:08.3208802Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T08:57:08.3241783Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T08:57:08.3270525Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T08:57:08.3295695Z ##[endgroup] 2025-12-04T08:57:08.3296513Z ##[group]Fetching submodules 2025-12-04T08:57:08.3300754Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T08:57:08.3643921Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T08:57:08.3975077Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-12-04T08:57:08.3977567Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-12-04T08:57:08.3980670Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-12-04T08:57:08.3983974Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-12-04T08:57:08.3987295Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-12-04T08:57:08.3991045Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-12-04T08:57:08.3994436Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-12-04T08:57:08.3998073Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-12-04T08:57:08.4002148Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-12-04T08:57:08.4006101Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-12-04T08:57:08.4010052Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-12-04T08:57:08.4013791Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-12-04T08:57:08.4018449Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-12-04T08:57:08.4022540Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-12-04T08:57:08.4026709Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-12-04T08:57:08.4031894Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-12-04T08:57:08.4039094Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-12-04T08:57:08.4043670Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-12-04T08:57:08.4048258Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:57:08.4052645Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-12-04T08:57:08.4057363Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-12-04T08:57:08.4061877Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-12-04T08:57:08.4066694Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-12-04T08:57:08.4071487Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-12-04T08:57:08.4076509Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-12-04T08:57:08.4081582Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-12-04T08:57:08.4086582Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-12-04T08:57:08.4091668Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-12-04T08:57:08.4097122Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-12-04T08:57:08.4102429Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-12-04T08:57:08.4107840Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-12-04T08:57:08.4113219Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-12-04T08:57:08.4119279Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-12-04T08:57:08.4127989Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-12-04T08:57:08.4133975Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-12-04T08:57:08.4139466Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-12-04T08:57:08.4145408Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-12-04T08:57:08.4176817Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2025-12-04T08:57:08.6386845Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2025-12-04T08:57:08.6387751Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2025-12-04T08:57:08.6388366Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2025-12-04T08:57:08.6414886Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2025-12-04T08:57:11.2196480Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2025-12-04T08:57:11.2198521Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'... 2025-12-04T08:57:11.2200673Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2025-12-04T08:57:11.2201882Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-12-04T08:57:11.2204864Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'... 2025-12-04T08:57:11.2206096Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-12-04T08:57:11.2329938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-12-04T08:57:11.2331019Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2025-12-04T08:57:11.2332829Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2025-12-04T08:57:11.2334518Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'... 2025-12-04T08:57:11.2336416Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2025-12-04T08:57:11.2338271Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-12-04T08:57:11.2339316Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2025-12-04T08:57:11.2340953Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2025-12-04T08:57:11.2342838Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-12-04T08:57:11.2343868Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'... 2025-12-04T08:57:11.2345727Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-12-04T08:57:11.3531369Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2025-12-04T08:57:11.5907107Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-12-04T08:57:11.5908215Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-12-04T08:57:11.6908412Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2025-12-04T08:57:13.7246773Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-12-04T08:57:13.7248124Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2025-12-04T08:57:13.7249202Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2025-12-04T08:57:13.7250246Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2025-12-04T08:57:13.7251324Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2025-12-04T08:57:13.8247710Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-12-04T08:57:29.7475729Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-12-04T08:57:29.7476363Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2025-12-04T08:57:29.7476926Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-12-04T08:57:29.7477736Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'... 2025-12-04T08:57:29.7478442Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2025-12-04T08:57:29.7666550Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T08:57:29.7821442Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T08:57:29.7935380Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T08:57:29.8250298Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T08:57:29.9149981Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T08:57:29.9677818Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T08:57:30.8528151Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T08:57:31.0470354Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T08:57:31.0493404Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:57:31.0523917Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-12-04T08:57:35.3442309Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T08:57:35.3718480Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T08:57:35.7847957Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:57:35.8361872Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T08:57:35.9364607Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T08:57:35.9878373Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T08:57:36.7184909Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T08:57:36.8936918Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T08:57:36.8970392Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-12-04T08:57:36.8972992Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:57:36.8976417Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:57:36.8980009Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-12-04T08:57:36.8983646Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-12-04T08:57:36.8987370Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:57:36.8991033Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-12-04T08:57:36.9023460Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-12-04T08:57:38.0163939Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-12-04T08:57:38.0165389Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-12-04T08:57:38.0166518Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-12-04T08:57:38.1165070Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-12-04T08:57:41.2468646Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-12-04T08:57:41.3470433Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-12-04T08:57:43.2628130Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T08:57:43.6654032Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:57:43.7674460Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T08:57:44.4624001Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T08:57:44.5109937Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:57:44.5254529Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T08:57:44.6383400Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T08:57:44.7205750Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T08:57:44.7228038Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:57:44.7231047Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:57:44.7259623Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-12-04T08:57:48.7535150Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-12-04T08:57:49.0442030Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T08:57:49.6617859Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T08:57:49.8204982Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T08:57:49.8527581Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T08:57:49.8959766Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T08:57:49.9271154Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T08:57:49.9740441Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:57:49.9896043Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T08:57:49.9914719Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-12-04T08:57:49.9943791Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-12-04T08:58:03.9759633Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T08:58:03.9994932Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T08:58:04.0853178Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T08:58:04.0882322Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:04.0885109Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:04.0888540Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:04.0920020Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-12-04T08:58:04.7102217Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-12-04T08:58:05.1978087Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-12-04T08:58:05.2904193Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T08:58:05.2926934Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:05.2929841Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:05.2933208Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:05.2936654Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:05.2940240Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:05.2944092Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:05.2948601Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:05.2952570Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:05.2956733Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:05.2989679Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-12-04T08:58:07.1853228Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-12-04T08:58:07.1854771Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-12-04T08:58:07.1855933Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'... 2025-12-04T08:58:07.1856962Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-12-04T08:58:07.1857941Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-12-04T08:58:07.1858919Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-12-04T08:58:07.1859926Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-12-04T08:58:07.2853644Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-12-04T08:58:11.7170787Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T08:58:11.7394747Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T08:58:11.7790463Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T08:58:11.7958600Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T08:58:11.7978124Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:11.8012172Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-12-04T08:58:12.0499428Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T08:58:12.0720249Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T08:58:12.1190978Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:12.2280709Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T08:58:12.2476214Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T08:58:12.2687080Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T08:58:12.2707064Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:12.2710663Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:12.2741544Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T08:58:14.2156198Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T08:58:14.4620901Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T08:58:14.5122765Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:58:14.5462576Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T08:58:14.5934421Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:14.6543064Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T08:58:14.6969839Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T08:58:14.8232150Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T08:58:15.3579044Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T08:58:15.3616528Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:15.3647360Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-12-04T08:58:16.0820273Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T08:58:16.1662810Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T08:58:16.1684089Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:16.1687352Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:16.1690720Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:16.1694258Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:16.1697997Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:16.1701583Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:16.1705252Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:16.1708901Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:16.1740867Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-12-04T08:58:16.5626403Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-12-04T08:58:16.5627342Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-12-04T08:58:16.5628201Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-12-04T08:58:16.5628979Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-12-04T08:58:16.6627673Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-12-04T08:58:17.1708537Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-12-04T08:58:23.2784993Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-12-04T08:58:23.8520494Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T08:58:23.8955203Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T08:58:23.9139834Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T08:58:24.0300665Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T08:58:24.0461878Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T08:58:24.0642298Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T08:58:24.0838534Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T08:58:24.0865543Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:24.0868860Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:24.0898479Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T08:58:26.0557327Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T08:58:26.3024922Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T08:58:26.3516362Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:58:26.9902498Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T08:58:27.0047191Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T08:58:27.2967524Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T08:58:27.2992991Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:58:27.2996313Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-12-04T08:58:27.3026784Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-12-04T08:58:27.8021981Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-12-04T08:58:28.0774924Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T08:58:28.1498878Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T08:58:28.1624039Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T08:58:28.1767419Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T08:58:28.2265034Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T08:58:28.2577705Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T08:58:28.3036668Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T08:58:28.3401144Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T08:58:28.3421632Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:58:28.3424890Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:58:28.3428378Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:58:28.3432007Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:58:28.3463424Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-12-04T08:58:29.2932598Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-12-04T08:58:29.2933499Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-12-04T08:58:29.3015033Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-12-04T08:58:29.3591326Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T08:58:29.3781636Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T08:58:29.4544013Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T08:58:29.4867804Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T08:58:29.4886392Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:58:29.4915890Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-12-04T08:58:29.6639431Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T08:58:29.6693220Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T08:58:29.7037457Z Entering 'android/libs/fbjni' 2025-12-04T08:58:29.7084320Z Entering 'third_party/FP16' 2025-12-04T08:58:29.7131687Z Entering 'third_party/FXdiv' 2025-12-04T08:58:29.7179821Z Entering 'third_party/NNPACK' 2025-12-04T08:58:29.7234578Z Entering 'third_party/NVTX' 2025-12-04T08:58:29.7282524Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:58:29.7331226Z Entering 'third_party/XNNPACK' 2025-12-04T08:58:29.7392914Z Entering 'third_party/aiter' 2025-12-04T08:58:29.7441161Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:29.7504836Z Entering 'third_party/benchmark' 2025-12-04T08:58:29.7557777Z Entering 'third_party/composable_kernel' 2025-12-04T08:58:29.7620742Z Entering 'third_party/cpp-httplib' 2025-12-04T08:58:29.7668076Z Entering 'third_party/cpuinfo' 2025-12-04T08:58:29.7714593Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:58:29.7764458Z Entering 'third_party/cutlass' 2025-12-04T08:58:29.7820991Z Entering 'third_party/fbgemm' 2025-12-04T08:58:29.7870914Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:29.7919350Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:29.7975149Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:29.8022412Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:29.8077995Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:29.8129645Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:29.8177266Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:58:29.8235550Z Entering 'third_party/flash-attention' 2025-12-04T08:58:29.8281737Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:29.8336069Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:29.8392544Z Entering 'third_party/flatbuffers' 2025-12-04T08:58:29.8446062Z Entering 'third_party/fmt' 2025-12-04T08:58:29.8492913Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:58:29.8542099Z Entering 'third_party/gloo' 2025-12-04T08:58:29.8590512Z Entering 'third_party/googletest' 2025-12-04T08:58:29.8641160Z Entering 'third_party/ideep' 2025-12-04T08:58:29.8687699Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:29.8741236Z Entering 'third_party/ittapi' 2025-12-04T08:58:29.8791897Z Entering 'third_party/kineto' 2025-12-04T08:58:29.8838107Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:29.8885808Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:29.8935079Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:29.8981569Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:29.9030697Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:29.9075997Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:29.9126427Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:29.9173275Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:29.9221363Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:29.9268676Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:29.9314057Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:29.9360324Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:29.9410847Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:29.9468743Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:29.9520387Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:29.9571353Z Entering 'third_party/kleidiai' 2025-12-04T08:58:29.9620479Z Entering 'third_party/mimalloc' 2025-12-04T08:58:29.9667041Z Entering 'third_party/nlohmann' 2025-12-04T08:58:29.9715210Z Entering 'third_party/onnx' 2025-12-04T08:58:29.9777887Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:29.9828613Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:58:29.9877331Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:29.9924404Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:29.9971389Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:30.0019256Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:30.0065400Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:30.0111338Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:30.0162542Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:30.0208836Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:30.0259663Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:30.0310973Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:30.0378679Z Entering 'third_party/pocketfft' 2025-12-04T08:58:30.0428324Z Entering 'third_party/protobuf' 2025-12-04T08:58:30.0484966Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:58:30.0532109Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:58:30.0582849Z Entering 'third_party/psimd' 2025-12-04T08:58:30.0631895Z Entering 'third_party/pthreadpool' 2025-12-04T08:58:30.0680732Z Entering 'third_party/pybind11' 2025-12-04T08:58:30.0734666Z Entering 'third_party/python-peachpy' 2025-12-04T08:58:30.0781691Z Entering 'third_party/sleef' 2025-12-04T08:58:30.0830567Z Entering 'third_party/tensorpipe' 2025-12-04T08:58:30.0882802Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:58:30.0930370Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:58:30.0977994Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:58:30.1023468Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:58:30.1068873Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:58:30.1134779Z ##[endgroup] 2025-12-04T08:58:30.1135214Z ##[group]Persisting credentials for submodules 2025-12-04T08:58:30.1140286Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T08:58:30.1465995Z Entering 'android/libs/fbjni' 2025-12-04T08:58:30.1530465Z Entering 'third_party/FP16' 2025-12-04T08:58:30.1593347Z Entering 'third_party/FXdiv' 2025-12-04T08:58:30.1658559Z Entering 'third_party/NNPACK' 2025-12-04T08:58:30.1723644Z Entering 'third_party/NVTX' 2025-12-04T08:58:30.1790819Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:58:30.1853941Z Entering 'third_party/XNNPACK' 2025-12-04T08:58:30.1932061Z Entering 'third_party/aiter' 2025-12-04T08:58:30.2001060Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:30.2078115Z Entering 'third_party/benchmark' 2025-12-04T08:58:30.2140744Z Entering 'third_party/composable_kernel' 2025-12-04T08:58:30.2210520Z Entering 'third_party/cpp-httplib' 2025-12-04T08:58:30.2272281Z Entering 'third_party/cpuinfo' 2025-12-04T08:58:30.2336396Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:58:30.2399742Z Entering 'third_party/cutlass' 2025-12-04T08:58:30.2472541Z Entering 'third_party/fbgemm' 2025-12-04T08:58:30.2536794Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:30.2598986Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:30.2670437Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:30.2736186Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:30.2809370Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:30.2871707Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:30.2942025Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:58:30.3015548Z Entering 'third_party/flash-attention' 2025-12-04T08:58:30.3084669Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:30.3153691Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:30.3233965Z Entering 'third_party/flatbuffers' 2025-12-04T08:58:30.3303528Z Entering 'third_party/fmt' 2025-12-04T08:58:30.3364092Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:58:30.3428845Z Entering 'third_party/gloo' 2025-12-04T08:58:30.3492337Z Entering 'third_party/googletest' 2025-12-04T08:58:30.3566080Z Entering 'third_party/ideep' 2025-12-04T08:58:30.3628463Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:30.3699704Z Entering 'third_party/ittapi' 2025-12-04T08:58:30.3771567Z Entering 'third_party/kineto' 2025-12-04T08:58:30.3834149Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:30.3902247Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:30.3976173Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:30.4040586Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:30.4109778Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:30.4171504Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:30.4244319Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:30.4310599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:30.4380023Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:30.4444346Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:30.4509219Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:30.4571550Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:30.4637319Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:30.4711139Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:30.4783062Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:30.4857028Z Entering 'third_party/kleidiai' 2025-12-04T08:58:30.4922575Z Entering 'third_party/mimalloc' 2025-12-04T08:58:30.4985797Z Entering 'third_party/nlohmann' 2025-12-04T08:58:30.5051160Z Entering 'third_party/onnx' 2025-12-04T08:58:30.5129067Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:30.5206359Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:58:30.5270358Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:30.5332625Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:30.5398745Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:30.5460692Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:30.5532895Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:30.5603611Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:30.5669246Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:30.5738681Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:30.5809664Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:30.5875386Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:30.5957456Z Entering 'third_party/pocketfft' 2025-12-04T08:58:30.6034273Z Entering 'third_party/protobuf' 2025-12-04T08:58:30.6107667Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:58:30.6177611Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:58:30.6245493Z Entering 'third_party/psimd' 2025-12-04T08:58:30.6312729Z Entering 'third_party/pthreadpool' 2025-12-04T08:58:30.6382448Z Entering 'third_party/pybind11' 2025-12-04T08:58:30.6448625Z Entering 'third_party/python-peachpy' 2025-12-04T08:58:30.6512929Z Entering 'third_party/sleef' 2025-12-04T08:58:30.6577902Z Entering 'third_party/tensorpipe' 2025-12-04T08:58:30.6641273Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:58:30.6702981Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:58:30.6774232Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:58:30.6835980Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:58:30.6897608Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:58:30.6987725Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T08:58:30.7322174Z Entering 'android/libs/fbjni' 2025-12-04T08:58:30.7386944Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T08:58:30.7401952Z Entering 'third_party/FP16' 2025-12-04T08:58:30.7460467Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T08:58:30.7480783Z Entering 'third_party/FXdiv' 2025-12-04T08:58:30.7540553Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T08:58:30.7560728Z Entering 'third_party/NNPACK' 2025-12-04T08:58:30.7619498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T08:58:30.7640409Z Entering 'third_party/NVTX' 2025-12-04T08:58:30.7703305Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T08:58:30.7724658Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:58:30.7783400Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T08:58:30.7803291Z Entering 'third_party/XNNPACK' 2025-12-04T08:58:30.7862081Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T08:58:30.7895726Z Entering 'third_party/aiter' 2025-12-04T08:58:30.7958797Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T08:58:30.7978998Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:30.8037022Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T08:58:30.8066263Z Entering 'third_party/benchmark' 2025-12-04T08:58:30.8123708Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:58:30.8143637Z Entering 'third_party/composable_kernel' 2025-12-04T08:58:30.8202483Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T08:58:30.8230826Z Entering 'third_party/cpp-httplib' 2025-12-04T08:58:30.8295739Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T08:58:30.8321123Z Entering 'third_party/cpuinfo' 2025-12-04T08:58:30.8383157Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T08:58:30.8403781Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:58:30.8460555Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T08:58:30.8481064Z Entering 'third_party/cutlass' 2025-12-04T08:58:30.8544224Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T08:58:30.8572078Z Entering 'third_party/fbgemm' 2025-12-04T08:58:30.8629310Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T08:58:30.8650838Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:30.8708865Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T08:58:30.8727413Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:30.8784326Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T08:58:30.8810489Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:30.8869490Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T08:58:30.8889652Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:30.8947526Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T08:58:30.8975237Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:30.9032794Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T08:58:30.9051918Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:30.9111903Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T08:58:30.9131445Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:58:30.9202009Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T08:58:30.9225555Z Entering 'third_party/flash-attention' 2025-12-04T08:58:30.9283255Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T08:58:30.9302194Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:30.9372423Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T08:58:30.9397195Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:30.9462473Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T08:58:30.9491035Z Entering 'third_party/flatbuffers' 2025-12-04T08:58:30.9549396Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T08:58:30.9571964Z Entering 'third_party/fmt' 2025-12-04T08:58:30.9631502Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:58:30.9651565Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:58:30.9711301Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T08:58:30.9731718Z Entering 'third_party/gloo' 2025-12-04T08:58:30.9790797Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T08:58:30.9811040Z Entering 'third_party/googletest' 2025-12-04T08:58:30.9870715Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:58:30.9890803Z Entering 'third_party/ideep' 2025-12-04T08:58:30.9956276Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T08:58:30.9978697Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:31.0047054Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T08:58:31.0074541Z Entering 'third_party/ittapi' 2025-12-04T08:58:31.0137409Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T08:58:31.0157905Z Entering 'third_party/kineto' 2025-12-04T08:58:31.0217965Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T08:58:31.0237864Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:31.0298723Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T08:58:31.0318048Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:31.0378742Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T08:58:31.0400582Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:31.0458903Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T08:58:31.0479394Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:31.0540276Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:58:31.0560480Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:31.0620748Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T08:58:31.0639155Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:31.0701198Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T08:58:31.0724600Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:31.0783431Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T08:58:31.0803175Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:31.0863931Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:58:31.0883617Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:31.0943486Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T08:58:31.0963875Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:31.1023363Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T08:58:31.1041783Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:31.1107679Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:58:31.1125392Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:31.1184524Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:58:31.1206249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:31.1265399Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:58:31.1289766Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:31.1349941Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T08:58:31.1369917Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:31.1429144Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T08:58:31.1458683Z Entering 'third_party/kleidiai' 2025-12-04T08:58:31.1518375Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T08:58:31.1539326Z Entering 'third_party/mimalloc' 2025-12-04T08:58:31.1597756Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T08:58:31.1618742Z Entering 'third_party/nlohmann' 2025-12-04T08:58:31.1678225Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T08:58:31.1699849Z Entering 'third_party/onnx' 2025-12-04T08:58:31.1768995Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T08:58:31.1807841Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:31.1864942Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:58:31.1887891Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:58:31.1947197Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T08:58:31.1971614Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:31.2029443Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:58:31.2049445Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:31.2106807Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:58:31.2126757Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:31.2183903Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T08:58:31.2202809Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:31.2264767Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T08:58:31.2284745Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:31.2343475Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T08:58:31.2362290Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:31.2425370Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T08:58:31.2443986Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:31.2502462Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:58:31.2520415Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:31.2579161Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:58:31.2606245Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:31.2664290Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:58:31.2685903Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:31.2742934Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T08:58:31.2780469Z Entering 'third_party/pocketfft' 2025-12-04T08:58:31.2842838Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T08:58:31.2862165Z Entering 'third_party/protobuf' 2025-12-04T08:58:31.2922108Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T08:58:31.2943618Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:58:31.3001961Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:58:31.3022126Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:58:31.3088177Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:58:31.3110890Z Entering 'third_party/psimd' 2025-12-04T08:58:31.3176109Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T08:58:31.3198106Z Entering 'third_party/pthreadpool' 2025-12-04T08:58:31.3256165Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T08:58:31.3276984Z Entering 'third_party/pybind11' 2025-12-04T08:58:31.3334589Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:58:31.3354049Z Entering 'third_party/python-peachpy' 2025-12-04T08:58:31.3411466Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T08:58:31.3431788Z Entering 'third_party/sleef' 2025-12-04T08:58:31.3489741Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T08:58:31.3510069Z Entering 'third_party/tensorpipe' 2025-12-04T08:58:31.3577397Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T08:58:31.3596603Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:58:31.3661375Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:58:31.3680847Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:58:31.3748499Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T08:58:31.3768321Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:58:31.3824334Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T08:58:31.3843475Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:58:31.3900889Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:58:31.3919196Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:58:31.3978989Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T08:58:31.5029027Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T08:58:31.5354589Z Entering 'android/libs/fbjni' 2025-12-04T08:58:31.5402390Z Entering 'third_party/FP16' 2025-12-04T08:58:31.5450166Z Entering 'third_party/FXdiv' 2025-12-04T08:58:31.5499148Z Entering 'third_party/NNPACK' 2025-12-04T08:58:31.5548708Z Entering 'third_party/NVTX' 2025-12-04T08:58:31.5595146Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:58:31.5643115Z Entering 'third_party/XNNPACK' 2025-12-04T08:58:31.5704506Z Entering 'third_party/aiter' 2025-12-04T08:58:31.5751999Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:31.5808444Z Entering 'third_party/benchmark' 2025-12-04T08:58:31.5857069Z Entering 'third_party/composable_kernel' 2025-12-04T08:58:31.5910534Z Entering 'third_party/cpp-httplib' 2025-12-04T08:58:31.5959189Z Entering 'third_party/cpuinfo' 2025-12-04T08:58:31.6008626Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:58:31.6057111Z Entering 'third_party/cutlass' 2025-12-04T08:58:31.6111035Z Entering 'third_party/fbgemm' 2025-12-04T08:58:31.6163757Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:31.6211138Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:31.6263692Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:31.6311833Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:31.6370782Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:31.6419485Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:31.6464604Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:58:31.6515387Z Entering 'third_party/flash-attention' 2025-12-04T08:58:31.6562735Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:31.6616110Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:31.6675513Z Entering 'third_party/flatbuffers' 2025-12-04T08:58:31.6734997Z Entering 'third_party/fmt' 2025-12-04T08:58:31.6782784Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:58:31.6830408Z Entering 'third_party/gloo' 2025-12-04T08:58:31.6880698Z Entering 'third_party/googletest' 2025-12-04T08:58:31.6934039Z Entering 'third_party/ideep' 2025-12-04T08:58:31.6980169Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:31.7037500Z Entering 'third_party/ittapi' 2025-12-04T08:58:31.7084048Z Entering 'third_party/kineto' 2025-12-04T08:58:31.7131655Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:31.7178764Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:31.7230870Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:31.7280202Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:31.7337432Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:31.7382151Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:31.7434045Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:31.7481838Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:31.7531507Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:31.7580817Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:31.7633599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:31.7680260Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:31.7733906Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:31.7788455Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:31.7840397Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:31.7899136Z Entering 'third_party/kleidiai' 2025-12-04T08:58:31.7950030Z Entering 'third_party/mimalloc' 2025-12-04T08:58:31.7999637Z Entering 'third_party/nlohmann' 2025-12-04T08:58:31.8050910Z Entering 'third_party/onnx' 2025-12-04T08:58:31.8113668Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:31.8168721Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:58:31.8222486Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:31.8270255Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:31.8324637Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:31.8371163Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:31.8424127Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:31.8470842Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:31.8518836Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:31.8562849Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:31.8611909Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:31.8661159Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:31.8730548Z Entering 'third_party/pocketfft' 2025-12-04T08:58:31.8779380Z Entering 'third_party/protobuf' 2025-12-04T08:58:31.8831031Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:58:31.8878906Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:58:31.8935703Z Entering 'third_party/psimd' 2025-12-04T08:58:31.8982635Z Entering 'third_party/pthreadpool' 2025-12-04T08:58:31.9031458Z Entering 'third_party/pybind11' 2025-12-04T08:58:31.9088206Z Entering 'third_party/python-peachpy' 2025-12-04T08:58:31.9138721Z Entering 'third_party/sleef' 2025-12-04T08:58:31.9188939Z Entering 'third_party/tensorpipe' 2025-12-04T08:58:31.9237856Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:58:31.9283642Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:58:31.9330914Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:58:31.9377639Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:58:31.9422905Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:58:31.9492918Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T08:58:31.9834562Z Entering 'android/libs/fbjni' 2025-12-04T08:58:31.9881915Z Entering 'third_party/FP16' 2025-12-04T08:58:31.9931724Z Entering 'third_party/FXdiv' 2025-12-04T08:58:31.9980009Z Entering 'third_party/NNPACK' 2025-12-04T08:58:32.0035658Z Entering 'third_party/NVTX' 2025-12-04T08:58:32.0084073Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:58:32.0131527Z Entering 'third_party/XNNPACK' 2025-12-04T08:58:32.0193727Z Entering 'third_party/aiter' 2025-12-04T08:58:32.0240264Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:32.0298221Z Entering 'third_party/benchmark' 2025-12-04T08:58:32.0345511Z Entering 'third_party/composable_kernel' 2025-12-04T08:58:32.0400577Z Entering 'third_party/cpp-httplib' 2025-12-04T08:58:32.0457942Z Entering 'third_party/cpuinfo' 2025-12-04T08:58:32.0510890Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:58:32.0572102Z Entering 'third_party/cutlass' 2025-12-04T08:58:32.0629905Z Entering 'third_party/fbgemm' 2025-12-04T08:58:32.0688112Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:32.0734248Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:32.0788405Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:32.0834771Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:32.0889327Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:32.0934861Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:32.0981475Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:58:32.1037754Z Entering 'third_party/flash-attention' 2025-12-04T08:58:32.1087191Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:32.1139295Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:32.1195658Z Entering 'third_party/flatbuffers' 2025-12-04T08:58:32.1246847Z Entering 'third_party/fmt' 2025-12-04T08:58:32.1293641Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:58:32.1342668Z Entering 'third_party/gloo' 2025-12-04T08:58:32.1390863Z Entering 'third_party/googletest' 2025-12-04T08:58:32.1444910Z Entering 'third_party/ideep' 2025-12-04T08:58:32.1490850Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:32.1546314Z Entering 'third_party/ittapi' 2025-12-04T08:58:32.1594026Z Entering 'third_party/kineto' 2025-12-04T08:58:32.1640587Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:32.1692501Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:32.1742592Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:32.1791293Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:32.1842864Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:32.1890023Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:32.1938589Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:32.1984891Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:32.2032695Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:32.2082248Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:32.2135306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:32.2181097Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:32.2232932Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:32.2289147Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:32.2339496Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:32.2395346Z Entering 'third_party/kleidiai' 2025-12-04T08:58:32.2448557Z Entering 'third_party/mimalloc' 2025-12-04T08:58:32.2494746Z Entering 'third_party/nlohmann' 2025-12-04T08:58:32.2543736Z Entering 'third_party/onnx' 2025-12-04T08:58:32.2606340Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:32.2657751Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:58:32.2705850Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:32.2753491Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:32.2800929Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:32.2853327Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:32.2901731Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:32.2954510Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:32.3001111Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:32.3061030Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:32.3110892Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:32.3165382Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:32.3230851Z Entering 'third_party/pocketfft' 2025-12-04T08:58:32.3287764Z Entering 'third_party/protobuf' 2025-12-04T08:58:32.3335861Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:58:32.3381722Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:58:32.3438514Z Entering 'third_party/psimd' 2025-12-04T08:58:32.3488464Z Entering 'third_party/pthreadpool' 2025-12-04T08:58:32.3534428Z Entering 'third_party/pybind11' 2025-12-04T08:58:32.3582317Z Entering 'third_party/python-peachpy' 2025-12-04T08:58:32.3631259Z Entering 'third_party/sleef' 2025-12-04T08:58:32.3683430Z Entering 'third_party/tensorpipe' 2025-12-04T08:58:32.3730780Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:58:32.3778466Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:58:32.3827600Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:58:32.3873552Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:58:32.3920492Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:58:32.3993784Z ##[endgroup] 2025-12-04T08:58:32.4033942Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T08:58:32.4056916Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:58:32.4177790Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-12-04T08:58:32.4178052Z cd "${GITHUB_WORKSPACE}" 2025-12-04T08:58:32.4178285Z # Clean stale submodule dirs 2025-12-04T08:58:32.4178518Z if [ -z "${NO_SUDO}" ]; then 2025-12-04T08:58:32.4178792Z  sudo git submodule foreach --recursive git clean -ffdx 2025-12-04T08:58:32.4179223Z else 2025-12-04T08:58:32.4179564Z  git submodule foreach --recursive git clean -ffdx 2025-12-04T08:58:32.4179823Z fi 2025-12-04T08:58:32.4189029Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:32.4189318Z env: 2025-12-04T08:58:32.4189477Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:32.4189653Z NO_SUDO: true 2025-12-04T08:58:32.4189820Z ##[endgroup] 2025-12-04T08:58:32.4541310Z Entering 'android/libs/fbjni' 2025-12-04T08:58:32.4579860Z Entering 'third_party/FP16' 2025-12-04T08:58:32.4624083Z Entering 'third_party/FXdiv' 2025-12-04T08:58:32.4659376Z Entering 'third_party/NNPACK' 2025-12-04T08:58:32.4700288Z Entering 'third_party/NVTX' 2025-12-04T08:58:32.4744473Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:58:32.4781868Z Entering 'third_party/XNNPACK' 2025-12-04T08:58:32.4906725Z Entering 'third_party/aiter' 2025-12-04T08:58:32.4952810Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:32.5069263Z Entering 'third_party/benchmark' 2025-12-04T08:58:32.5108228Z Entering 'third_party/composable_kernel' 2025-12-04T08:58:32.5231518Z Entering 'third_party/cpp-httplib' 2025-12-04T08:58:32.5285926Z Entering 'third_party/cpuinfo' 2025-12-04T08:58:32.5325301Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:58:32.5364192Z Entering 'third_party/cutlass' 2025-12-04T08:58:32.5471297Z Entering 'third_party/fbgemm' 2025-12-04T08:58:32.5545651Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:32.5581723Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:32.5710144Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:32.5750860Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:32.5857984Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:32.5901007Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:32.5940945Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:58:32.5994610Z Entering 'third_party/flash-attention' 2025-12-04T08:58:32.6037833Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:32.6140435Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:32.6234599Z Entering 'third_party/flatbuffers' 2025-12-04T08:58:32.6317189Z Entering 'third_party/fmt' 2025-12-04T08:58:32.6353766Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:58:32.6391880Z Entering 'third_party/gloo' 2025-12-04T08:58:32.6430972Z Entering 'third_party/googletest' 2025-12-04T08:58:32.6482649Z Entering 'third_party/ideep' 2025-12-04T08:58:32.6518401Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:32.6612218Z Entering 'third_party/ittapi' 2025-12-04T08:58:32.6651262Z Entering 'third_party/kineto' 2025-12-04T08:58:32.6690950Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:32.6732814Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:32.6783295Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:32.6824528Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:32.6861545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:32.6893224Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:32.6937214Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:32.6972881Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:32.7011326Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:32.7064867Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:32.7100743Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:32.7141511Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:32.7195570Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:32.7239591Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:32.7283915Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:32.7324496Z Entering 'third_party/kleidiai' 2025-12-04T08:58:32.7371899Z Entering 'third_party/mimalloc' 2025-12-04T08:58:32.7409514Z Entering 'third_party/nlohmann' 2025-12-04T08:58:32.7461016Z Entering 'third_party/onnx' 2025-12-04T08:58:32.7808045Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:32.7850908Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:58:32.7920724Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:32.7958038Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:32.8003605Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:32.8039355Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:32.8094532Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:32.8131679Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:32.8168823Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:32.8204479Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:32.8256084Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:32.8296363Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:32.8572859Z Entering 'third_party/pocketfft' 2025-12-04T08:58:32.8612704Z Entering 'third_party/protobuf' 2025-12-04T08:58:32.8702817Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:58:32.8739174Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:58:32.8782312Z Entering 'third_party/psimd' 2025-12-04T08:58:32.8820906Z Entering 'third_party/pthreadpool' 2025-12-04T08:58:32.8857990Z Entering 'third_party/pybind11' 2025-12-04T08:58:32.8906686Z Entering 'third_party/python-peachpy' 2025-12-04T08:58:32.8942055Z Entering 'third_party/sleef' 2025-12-04T08:58:32.8981301Z Entering 'third_party/tensorpipe' 2025-12-04T08:58:32.9021589Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:58:32.9059216Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:58:32.9100124Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:58:32.9141380Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:58:32.9180626Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:58:32.9333558Z Prepare all required actions 2025-12-04T08:58:32.9334049Z Getting action download info 2025-12-04T08:58:33.0827904Z ##[group]Run ./.github/actions/setup-linux 2025-12-04T08:58:33.0828134Z env: 2025-12-04T08:58:33.0828294Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:33.0828478Z ##[endgroup] 2025-12-04T08:58:33.0860950Z ##[group]Run set -euo pipefail 2025-12-04T08:58:33.0861223Z set -euo pipefail 2025-12-04T08:58:33.0861435Z function get_ec2_metadata() { 2025-12-04T08:58:33.0861704Z  # Pulled from instance metadata endpoint for EC2 2025-12-04T08:58:33.0862167Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-12-04T08:58:33.0862566Z  category=$1 2025-12-04T08:58:33.0862827Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-12-04T08:58:33.0863128Z  runner_name_str=i-0e5520d20214059b0 2025-12-04T08:58:33.0863400Z  if [[ -f /.inarc ]]; then 2025-12-04T08:58:33.0863644Z  echo "ARC Runner, no info on ec2 metadata" 2025-12-04T08:58:33.0863918Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-12-04T08:58:33.0864245Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-12-04T08:58:33.0864551Z  else 2025-12-04T08:58:33.0865324Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-12-04T08:58:33.0865966Z  fi 2025-12-04T08:58:33.0866122Z } 2025-12-04T08:58:33.0866306Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-12-04T08:58:33.0866597Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-12-04T08:58:33.0866943Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-12-04T08:58:33.0867246Z echo "system info $(uname -a)" 2025-12-04T08:58:33.0874937Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:33.0875234Z env: 2025-12-04T08:58:33.0875391Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:33.0875570Z ##[endgroup] 2025-12-04T08:58:33.1016736Z ami-id: ami-08982f1c5bf93d976 2025-12-04T08:58:33.1126755Z instance-id: i-0e5520d20214059b0 2025-12-04T08:58:33.1227203Z instance-type: g6.4xlarge 2025-12-04T08:58:33.1239760Z system info Linux ip-10-1-34-86.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-12-04T08:58:33.1259112Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T08:58:33.1259453Z if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T08:58:33.1266432Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:33.1266711Z env: 2025-12-04T08:58:33.1266866Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:33.1267051Z ##[endgroup] 2025-12-04T08:58:34.5592074Z Thu Dec 4 08:58:34 2025 2025-12-04T08:58:34.5592986Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:58:34.5594098Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T08:58:34.5595115Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:58:34.5596184Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T08:58:34.5597378Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T08:58:34.5598270Z | | | MIG M. | 2025-12-04T08:58:34.5598657Z |=========================================+========================+======================| 2025-12-04T08:58:34.5662546Z | 0 NVIDIA L4 Off | 00000000:35:00.0 Off | 0 | 2025-12-04T08:58:34.5663356Z | N/A 36C P0 30W / 72W | 0MiB / 23034MiB | 4% Default | 2025-12-04T08:58:34.5663748Z | | | N/A | 2025-12-04T08:58:34.5664115Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:58:34.5664608Z 2025-12-04T08:58:34.5664781Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:58:34.5665193Z | Processes: | 2025-12-04T08:58:34.5665596Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T08:58:34.5665962Z | ID ID Usage | 2025-12-04T08:58:34.5666263Z |=========================================================================================| 2025-12-04T08:58:34.5667301Z | No running processes found | 2025-12-04T08:58:34.5667741Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:58:34.8891374Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:58:34.8892234Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:58:34.8902498Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:34.8902784Z env: 2025-12-04T08:58:34.8902938Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:34.8903133Z ##[endgroup] 2025-12-04T08:58:34.8989771Z ##[group]Run if systemctl is-active --quiet docker; then 2025-12-04T08:58:34.8990174Z if systemctl is-active --quiet docker; then 2025-12-04T08:58:34.8990520Z  echo "Docker daemon is running..."; 2025-12-04T08:58:34.8990826Z else 2025-12-04T08:58:34.8991132Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-12-04T08:58:34.8991502Z fi 2025-12-04T08:58:34.8999063Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:34.8999339Z env: 2025-12-04T08:58:34.8999489Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:34.8999689Z ##[endgroup] 2025-12-04T08:58:34.9086457Z Docker daemon is running... 2025-12-04T08:58:34.9122866Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T08:58:34.9123083Z with: 2025-12-04T08:58:34.9123230Z shell: bash 2025-12-04T08:58:34.9123387Z timeout_minutes: 5 2025-12-04T08:58:34.9123557Z max_attempts: 3 2025-12-04T08:58:34.9123720Z retry_wait_seconds: 30 2025-12-04T08:58:34.9125371Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-12-04T08:58:34.9127010Z polling_interval_seconds: 1 2025-12-04T08:58:34.9127215Z warning_on_retry: true 2025-12-04T08:58:34.9127402Z continue_on_error: false 2025-12-04T08:58:34.9127576Z env: 2025-12-04T08:58:34.9127727Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:34.9127912Z AWS_RETRY_MODE: standard 2025-12-04T08:58:34.9128089Z AWS_MAX_ATTEMPTS: 5 2025-12-04T08:58:34.9128273Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:58:34.9128470Z ##[endgroup] 2025-12-04T08:58:35.9086769Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:58:35.9088144Z Configure a credential helper to remove this warning. See 2025-12-04T08:58:35.9088715Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:58:35.9089053Z 2025-12-04T08:58:35.9089150Z Login Succeeded 2025-12-04T08:58:36.3720246Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:58:36.3720853Z Configure a credential helper to remove this warning. See 2025-12-04T08:58:36.3721413Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:58:36.3721751Z 2025-12-04T08:58:36.3721848Z Login Succeeded 2025-12-04T08:58:36.9885628Z Command completed after 1 attempt(s). 2025-12-04T08:58:36.9948416Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:58:36.9948789Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:58:36.9949111Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:58:36.9958271Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:36.9958546Z env: 2025-12-04T08:58:36.9958705Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:36.9958895Z ##[endgroup] 2025-12-04T08:58:37.0055299Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:58:37.0055714Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:58:37.0056212Z # shellcheck disable=SC2046 2025-12-04T08:58:37.0056456Z docker stop $(docker ps -q) || true 2025-12-04T08:58:37.0056704Z # Prune all of the docker images 2025-12-04T08:58:37.0056936Z docker system prune -af 2025-12-04T08:58:37.0064057Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:37.0064333Z env: 2025-12-04T08:58:37.0064494Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:37.0064673Z ##[endgroup] 2025-12-04T08:58:37.0304444Z "docker stop" requires at least 1 argument. 2025-12-04T08:58:37.0304985Z See 'docker stop --help'. 2025-12-04T08:58:37.0305255Z 2025-12-04T08:58:37.0305523Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T08:58:37.0305788Z 2025-12-04T08:58:37.0305895Z Stop one or more running containers 2025-12-04T08:58:37.0569903Z Total reclaimed space: 0B 2025-12-04T08:58:37.0721553Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T08:58:37.0721919Z with: 2025-12-04T08:58:37.0722507Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0723147Z use-custom-docker-registry: true 2025-12-04T08:58:37.0723370Z docker-build-dir: .ci/docker 2025-12-04T08:58:37.0723580Z docker-build-script: ./build.sh 2025-12-04T08:58:37.0723793Z working-directory: . 2025-12-04T08:58:37.0724036Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:37.0724320Z force-push: false 2025-12-04T08:58:37.0724482Z env: 2025-12-04T08:58:37.0724624Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:37.0724806Z ##[endgroup] 2025-12-04T08:58:37.0740354Z ##[group]Run set -ex 2025-12-04T08:58:37.0740579Z set -ex 2025-12-04T08:58:37.0740739Z  2025-12-04T08:58:37.0741050Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T08:58:37.0741523Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T08:58:37.0741940Z # job could then download the pre-built image as usual 2025-12-04T08:58:37.0742422Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T08:58:37.0742866Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0743101Z else 2025-12-04T08:58:37.0743295Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0743626Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0743907Z  2025-12-04T08:58:37.0744293Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T08:58:37.0744751Z  exit 0 2025-12-04T08:58:37.0744901Z fi 2025-12-04T08:58:37.0745049Z  2025-12-04T08:58:37.0745294Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T08:58:37.0745726Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T08:58:37.0746094Z  # use it as it is, but first let's extract the tag 2025-12-04T08:58:37.0746432Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T08:58:37.0746791Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0747148Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0747425Z else 2025-12-04T08:58:37.0747614Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T08:58:37.0747888Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T08:58:37.0748159Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T08:58:37.0748396Z  fi 2025-12-04T08:58:37.0748712Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T08:58:37.0749299Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0749742Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0750228Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0750527Z fi 2025-12-04T08:58:37.0758000Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:37.0758267Z env: 2025-12-04T08:58:37.0758420Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:37.0758599Z REPO_NAME: pytorch 2025-12-04T08:58:37.0759303Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0760002Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:58:37.0760208Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T08:58:37.0760470Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:37.0760752Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T08:58:37.0760953Z CUSTOM_TAG_PREFIX: 2025-12-04T08:58:37.0761118Z ##[endgroup] 2025-12-04T08:58:37.0787104Z + [[ -d .ci/docker ]] 2025-12-04T08:58:37.0787383Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T08:58:37.0787637Z + [[ true == \t\r\u\e ]] 2025-12-04T08:58:37.0787856Z + echo skip=false 2025-12-04T08:58:37.0788823Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T08:58:37.0794485Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0795145Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T08:58:37.0819728Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0820572Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0821630Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0843393Z ##[group]Run set +e 2025-12-04T08:58:37.0843620Z set +e 2025-12-04T08:58:37.0843781Z set -x 2025-12-04T08:58:37.0843938Z  2025-12-04T08:58:37.0844090Z login() { 2025-12-04T08:58:37.0844442Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:58:37.0844820Z } 2025-12-04T08:58:37.0844977Z  2025-12-04T08:58:37.0845121Z retry () { 2025-12-04T08:58:37.0845319Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:58:37.0845546Z } 2025-12-04T08:58:37.0845686Z  2025-12-04T08:58:37.0845853Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:58:37.0846063Z  2025-12-04T08:58:37.0846215Z START_TIME=$(date +%s) 2025-12-04T08:58:37.0846421Z # Wait up to 120 minutes 2025-12-04T08:58:37.0846683Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T08:58:37.0847028Z  # Check if image already exists, if it does then skip building it 2025-12-04T08:58:37.0847382Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T08:58:37.0847637Z  exit 0 2025-12-04T08:58:37.0847803Z  fi 2025-12-04T08:58:37.0847945Z  2025-12-04T08:58:37.0848217Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T08:58:37.0848678Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T08:58:37.0849284Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T08:58:37.0849642Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T08:58:37.0849938Z  # It's a Docker build job, let's build the image 2025-12-04T08:58:37.0850185Z  break 2025-12-04T08:58:37.0850348Z  else 2025-12-04T08:58:37.0850582Z  # It's a regular build job, wait for the image to become available 2025-12-04T08:58:37.0850877Z  sleep 300 2025-12-04T08:58:37.0851054Z  fi 2025-12-04T08:58:37.0851201Z done 2025-12-04T08:58:37.0851347Z  2025-12-04T08:58:37.0851595Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T08:58:37.0852122Z # be empty. The default action would be to continue rebuild the image 2025-12-04T08:58:37.0852497Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T08:58:37.0852821Z  # if we're on the base branch then use the parent commit 2025-12-04T08:58:37.0853117Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T08:58:37.0853341Z else 2025-12-04T08:58:37.0853572Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T08:58:37.0853910Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T08:58:37.0854158Z fi 2025-12-04T08:58:37.0854306Z  2025-12-04T08:58:37.0854472Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T08:58:37.0854725Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0854950Z  2025-12-04T08:58:37.0855272Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T08:58:37.0855662Z  exit 0 2025-12-04T08:58:37.0855815Z fi 2025-12-04T08:58:37.0855959Z  2025-12-04T08:58:37.0856179Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T08:58:37.0856659Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T08:58:37.0857074Z  exit 1 2025-12-04T08:58:37.0857229Z fi 2025-12-04T08:58:37.0857376Z  2025-12-04T08:58:37.0857624Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T08:58:37.0858086Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T08:58:37.0858499Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T08:58:37.0858978Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T08:58:37.0859509Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T08:58:37.0859829Z fi 2025-12-04T08:58:37.0859975Z  2025-12-04T08:58:37.0860154Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:58:37.0866984Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:37.0867268Z env: 2025-12-04T08:58:37.0867436Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:37.0867632Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:58:37.0867882Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:58:37.0868529Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0869313Z DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.0869802Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:37.0870084Z DOCKER_PUSH: 2025-12-04T08:58:37.0870247Z ##[endgroup] 2025-12-04T08:58:37.0895532Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:37.0896106Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:37.0898361Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:58:37.0899466Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:37.5496234Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:58:37.5496804Z Configure a credential helper to remove this warning. See 2025-12-04T08:58:37.5497327Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:58:37.5497672Z 2025-12-04T08:58:37.5499056Z Login Succeeded 2025-12-04T08:58:37.5520357Z ++ date +%s 2025-12-04T08:58:37.5531265Z + START_TIME=1764838717 2025-12-04T08:58:37.5534799Z ++ date +%s 2025-12-04T08:58:37.5546486Z + [[ 1764831517 -lt 1764838717 ]] 2025-12-04T08:58:37.5547310Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:37.7355466Z { 2025-12-04T08:58:37.7355770Z "schemaVersion": 2, 2025-12-04T08:58:37.7356137Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T08:58:37.7356466Z "config": { 2025-12-04T08:58:37.7356704Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T08:58:37.7357007Z "size": 34864, 2025-12-04T08:58:37.7357308Z "digest": "sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301" 2025-12-04T08:58:37.7357628Z }, 2025-12-04T08:58:37.7357772Z "layers": [ 2025-12-04T08:58:37.7357920Z { 2025-12-04T08:58:37.7358145Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7358437Z "size": 30447951, 2025-12-04T08:58:37.7358752Z "digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63" 2025-12-04T08:58:37.7359077Z }, 2025-12-04T08:58:37.7359214Z { 2025-12-04T08:58:37.7359461Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7359755Z "size": 1554, 2025-12-04T08:58:37.7360176Z "digest": "sha256:0678d56345c994444b77bb70b1177189d23e794748b1d75ffc45d227c7dea94a" 2025-12-04T08:58:37.7360503Z }, 2025-12-04T08:58:37.7360632Z { 2025-12-04T08:58:37.7360856Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7361141Z "size": 313275661, 2025-12-04T08:58:37.7361442Z "digest": "sha256:45f5c9ddfce78349dff3d5edfbaa0310ae17311f66abdcd7e00fa21b500e801c" 2025-12-04T08:58:37.7361776Z }, 2025-12-04T08:58:37.7361907Z { 2025-12-04T08:58:37.7362123Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7362406Z "size": 787, 2025-12-04T08:58:37.7362688Z "digest": "sha256:086b1df51ac1162d9c45698e9dfaf91c6c222c8bd9ab01797ac8f9344bc8044f" 2025-12-04T08:58:37.7363016Z }, 2025-12-04T08:58:37.7363145Z { 2025-12-04T08:58:37.7363361Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7363653Z "size": 106, 2025-12-04T08:58:37.7363928Z "digest": "sha256:fe8a7b64bf98352f89057bcba66beef2fb44cc05fbd3606abccd8e86cf476234" 2025-12-04T08:58:37.7364326Z }, 2025-12-04T08:58:37.7364454Z { 2025-12-04T08:58:37.7364663Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7364939Z "size": 703, 2025-12-04T08:58:37.7365204Z "digest": "sha256:7680723e9a578033dd106b45784c639f06cc8adb1f5239ec513d9de01087c1af" 2025-12-04T08:58:37.7365512Z }, 2025-12-04T08:58:37.7365643Z { 2025-12-04T08:58:37.7365858Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7366127Z "size": 1216, 2025-12-04T08:58:37.7366401Z "digest": "sha256:9c5027aeeb4e3101f48c1d2e400c387110e1009e42497ee801f1b4b7f7edb5c0" 2025-12-04T08:58:37.7366727Z }, 2025-12-04T08:58:37.7366867Z { 2025-12-04T08:58:37.7367080Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7367613Z "size": 483, 2025-12-04T08:58:37.7367960Z "digest": "sha256:9a56521103600bd37a1e7c1191b5136c2d738c092f8a6701499f7068a32c2628" 2025-12-04T08:58:37.7368269Z }, 2025-12-04T08:58:37.7368400Z { 2025-12-04T08:58:37.7368613Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7368887Z "size": 110361875, 2025-12-04T08:58:37.7369164Z "digest": "sha256:375c4427e9141269458333b1463fdb219e736fd6231ec1c56c625c48437ace77" 2025-12-04T08:58:37.7369483Z }, 2025-12-04T08:58:37.7369609Z { 2025-12-04T08:58:37.7369824Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7370103Z "size": 4961, 2025-12-04T08:58:37.7370381Z "digest": "sha256:a86faaa7dbdd70e678e5ea20072637ee42618921ca8f80ca089f789325d4b0c2" 2025-12-04T08:58:37.7370691Z }, 2025-12-04T08:58:37.7370818Z { 2025-12-04T08:58:37.7371190Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7371473Z "size": 1755, 2025-12-04T08:58:37.7371755Z "digest": "sha256:fb7848686804957915d98f8655ef6da0fe4c521b50a82aefdebf475983505a15" 2025-12-04T08:58:37.7372072Z }, 2025-12-04T08:58:37.7372197Z { 2025-12-04T08:58:37.7372416Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7372703Z "size": 724, 2025-12-04T08:58:37.7372966Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:58:37.7373277Z }, 2025-12-04T08:58:37.7373406Z { 2025-12-04T08:58:37.7373622Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7373900Z "size": 543, 2025-12-04T08:58:37.7374167Z "digest": "sha256:79dc80f426b29d4ae9157b967050b03e66aa0c4b1295b944a1dd70106be87066" 2025-12-04T08:58:37.7374479Z }, 2025-12-04T08:58:37.7374597Z { 2025-12-04T08:58:37.7374814Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7375092Z "size": 3185190117, 2025-12-04T08:58:37.7375381Z "digest": "sha256:a13fcc1b90bb9c251ebe7ef2a03c4cb3afa1c8bdafe84f5f85136773059a3735" 2025-12-04T08:58:37.7375705Z }, 2025-12-04T08:58:37.7375835Z { 2025-12-04T08:58:37.7376040Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7376315Z "size": 32, 2025-12-04T08:58:37.7376588Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7376899Z }, 2025-12-04T08:58:37.7377036Z { 2025-12-04T08:58:37.7377253Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7377574Z "size": 396, 2025-12-04T08:58:37.7378021Z "digest": "sha256:549db4d6c618ecd9534658a233e3c90508f82d8735f965c2786b2eaa078869e5" 2025-12-04T08:58:37.7378518Z }, 2025-12-04T08:58:37.7378660Z { 2025-12-04T08:58:37.7378889Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7379174Z "size": 236860, 2025-12-04T08:58:37.7379458Z "digest": "sha256:5c63528cb580001e65104f4cb0809bf0673a00f989a7db42fd6d86aa1ec27cee" 2025-12-04T08:58:37.7379774Z }, 2025-12-04T08:58:37.7379896Z { 2025-12-04T08:58:37.7380111Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7380393Z "size": 231, 2025-12-04T08:58:37.7380659Z "digest": "sha256:75bd83b989a44e4d4119a3f972891025eb0e9ce95cfbe4a0ca5cdbe7130028d6" 2025-12-04T08:58:37.7380979Z }, 2025-12-04T08:58:37.7381106Z { 2025-12-04T08:58:37.7381313Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7381593Z "size": 3043497, 2025-12-04T08:58:37.7381876Z "digest": "sha256:de6e78970f517178cb91f36cd02bd9ca7b72a08fb82a0f9007516026f258c035" 2025-12-04T08:58:37.7382185Z }, 2025-12-04T08:58:37.7382314Z { 2025-12-04T08:58:37.7382528Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7382805Z "size": 1472, 2025-12-04T08:58:37.7383089Z "digest": "sha256:e13ed7c7e4736e81dc21af755b3363eb26e4d3b2f1ca988dfe65effa47d8fa42" 2025-12-04T08:58:37.7383539Z }, 2025-12-04T08:58:37.7383668Z { 2025-12-04T08:58:37.7383876Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7384152Z "size": 481, 2025-12-04T08:58:37.7384432Z "digest": "sha256:6e2949bcb74152577a0f20c38bcb6dd80f5e68427e3e531a80e08c9ecc73a979" 2025-12-04T08:58:37.7384745Z }, 2025-12-04T08:58:37.7384876Z { 2025-12-04T08:58:37.7385097Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7385372Z "size": 202, 2025-12-04T08:58:37.7385648Z "digest": "sha256:14d69d9aaec70287efd2fd35c4f93e43a29a4098458cc9fca1c93f02ad7356cb" 2025-12-04T08:58:37.7385967Z }, 2025-12-04T08:58:37.7386095Z { 2025-12-04T08:58:37.7386325Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7386600Z "size": 607, 2025-12-04T08:58:37.7386968Z "digest": "sha256:5c02769dd8e5bba2f7f5fd84bde9595fcb3bdbffcae497503fa846f9b5e78bf5" 2025-12-04T08:58:37.7387287Z }, 2025-12-04T08:58:37.7387412Z { 2025-12-04T08:58:37.7387638Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7387916Z "size": 7889619584, 2025-12-04T08:58:37.7388200Z "digest": "sha256:35041ce524ac4afec40ecd73b1393c830614f1f79d43a6439767a6c7d5b7027b" 2025-12-04T08:58:37.7388511Z }, 2025-12-04T08:58:37.7388631Z { 2025-12-04T08:58:37.7388845Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7389118Z "size": 830, 2025-12-04T08:58:37.7389376Z "digest": "sha256:2fa92dc5885e080e049ceb4139288b6c0e39fab34256945708b08ea55a1f7a0b" 2025-12-04T08:58:37.7389686Z }, 2025-12-04T08:58:37.7389811Z { 2025-12-04T08:58:37.7390042Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7390319Z "size": 33451739, 2025-12-04T08:58:37.7390608Z "digest": "sha256:2b85eafbd92a0e70a0a70154ad8bf4584095e576d95873368f30373f5966714a" 2025-12-04T08:58:37.7390918Z }, 2025-12-04T08:58:37.7391042Z { 2025-12-04T08:58:37.7391252Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7391532Z "size": 104, 2025-12-04T08:58:37.7391803Z "digest": "sha256:ff755a4ddad7880f23c6b767d432d6f1eafdb62b3ea18f8a98e22c441c099fcb" 2025-12-04T08:58:37.7392127Z }, 2025-12-04T08:58:37.7392255Z { 2025-12-04T08:58:37.7392461Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7392749Z "size": 1496, 2025-12-04T08:58:37.7393020Z "digest": "sha256:09eb41bdf42d8605b57b2363348154140904dec914b34a67298b82122bfce2b3" 2025-12-04T08:58:37.7393324Z }, 2025-12-04T08:58:37.7393451Z { 2025-12-04T08:58:37.7393665Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7393944Z "size": 458787828, 2025-12-04T08:58:37.7394224Z "digest": "sha256:11ede4d59e935e62f41b33220fe871794ab5e57ce724173b713368977683bcf6" 2025-12-04T08:58:37.7394534Z }, 2025-12-04T08:58:37.7394663Z { 2025-12-04T08:58:37.7394876Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7395167Z "size": 164, 2025-12-04T08:58:37.7395440Z "digest": "sha256:1283cd8f801a142172f3ab76fd472df8583223d9437de3e4d18d8cf98ea3fa98" 2025-12-04T08:58:37.7395750Z }, 2025-12-04T08:58:37.7395879Z { 2025-12-04T08:58:37.7396097Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7396369Z "size": 346, 2025-12-04T08:58:37.7396637Z "digest": "sha256:024fa855425fa524ad4500660cf61d53be62b99556d31b8b280d14caba434a35" 2025-12-04T08:58:37.7396957Z }, 2025-12-04T08:58:37.7397087Z { 2025-12-04T08:58:37.7397298Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7397577Z "size": 32, 2025-12-04T08:58:37.7397855Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7398167Z }, 2025-12-04T08:58:37.7398295Z { 2025-12-04T08:58:37.7398509Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7398914Z "size": 106, 2025-12-04T08:58:37.7399187Z "digest": "sha256:303e6747a62efecf5efa1f97d0e66b40a3b39da8d79a51f75b89f4c92ae7ec52" 2025-12-04T08:58:37.7399514Z }, 2025-12-04T08:58:37.7399642Z { 2025-12-04T08:58:37.7399852Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7400213Z "size": 424, 2025-12-04T08:58:37.7400480Z "digest": "sha256:3017cdf4838bcc9a33daebc07487f8ae1f6bd6e7ce8322c14f5480e8db9ef90e" 2025-12-04T08:58:37.7400800Z }, 2025-12-04T08:58:37.7400927Z { 2025-12-04T08:58:37.7401137Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7401408Z "size": 19309374, 2025-12-04T08:58:37.7401695Z "digest": "sha256:6b6cd1c358e886dc6ed7fd46ac4bcc1a0a73b7b1301739ea1953478ee5d83f50" 2025-12-04T08:58:37.7402012Z }, 2025-12-04T08:58:37.7402138Z { 2025-12-04T08:58:37.7402444Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7402749Z "size": 108, 2025-12-04T08:58:37.7403019Z "digest": "sha256:b2dd045011241d1cf8889e2a7369d9fe4844dfe15529b520ccd6a59bd3c1532e" 2025-12-04T08:58:37.7403332Z }, 2025-12-04T08:58:37.7403461Z { 2025-12-04T08:58:37.7403671Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7403954Z "size": 827, 2025-12-04T08:58:37.7404228Z "digest": "sha256:55adc51fe5897031d4cf2f2b8fd162213f6e46a52848630c616606271b97952e" 2025-12-04T08:58:37.7404541Z }, 2025-12-04T08:58:37.7404663Z { 2025-12-04T08:58:37.7404878Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7405157Z "size": 724, 2025-12-04T08:58:37.7405426Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:58:37.7405741Z }, 2025-12-04T08:58:37.7405867Z { 2025-12-04T08:58:37.7406076Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7406354Z "size": 149, 2025-12-04T08:58:37.7406621Z "digest": "sha256:a43ca0e4b837964b12b7469194cfe939c26de027298040028975324dce25938a" 2025-12-04T08:58:37.7406937Z }, 2025-12-04T08:58:37.7407068Z { 2025-12-04T08:58:37.7407286Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7407558Z "size": 138, 2025-12-04T08:58:37.7407832Z "digest": "sha256:b7212f17fd1404837fcfdd086dd0e2667931e4db377d45d8d89a44390c84e11d" 2025-12-04T08:58:37.7408149Z }, 2025-12-04T08:58:37.7408283Z { 2025-12-04T08:58:37.7408496Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7408773Z "size": 141, 2025-12-04T08:58:37.7409047Z "digest": "sha256:083e42cac090e6486c35f392b64ee54448f5e4aa947003aeb3e1f92c8ea5c099" 2025-12-04T08:58:37.7409358Z }, 2025-12-04T08:58:37.7409489Z { 2025-12-04T08:58:37.7409702Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7409978Z "size": 32, 2025-12-04T08:58:37.7410256Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7410577Z }, 2025-12-04T08:58:37.7410701Z { 2025-12-04T08:58:37.7410913Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7411189Z "size": 223, 2025-12-04T08:58:37.7411455Z "digest": "sha256:0a00b784a4aac341795729b254f7edd09e811b7f51d0c58e0e6bfeeee6940503" 2025-12-04T08:58:37.7411767Z }, 2025-12-04T08:58:37.7411891Z { 2025-12-04T08:58:37.7412103Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7412384Z "size": 255, 2025-12-04T08:58:37.7412652Z "digest": "sha256:c6173c779f7ba143a21214ea5f032b141863a37ceb4c0ac01d3248c216ce5241" 2025-12-04T08:58:37.7412962Z }, 2025-12-04T08:58:37.7413083Z { 2025-12-04T08:58:37.7413293Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7413568Z "size": 145520672, 2025-12-04T08:58:37.7413845Z "digest": "sha256:ed3d1e3387b924585c332bf1bc252fa159cd0d25256a874043ff0141b1ab5ff7" 2025-12-04T08:58:37.7414160Z }, 2025-12-04T08:58:37.7414404Z { 2025-12-04T08:58:37.7414612Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7414888Z "size": 106, 2025-12-04T08:58:37.7444380Z "digest": "sha256:b29343478586aeee19d2a622661716f6f1591280c890f49b727a8da13a610784" 2025-12-04T08:58:37.7444739Z }, 2025-12-04T08:58:37.7444872Z { 2025-12-04T08:58:37.7445104Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7445479Z "size": 312293530, 2025-12-04T08:58:37.7445782Z "digest": "sha256:c6f0520487fb506bc4601fd84d5f28d8a76b203e004731e4b2067c2ab1a14e0b" 2025-12-04T08:58:37.7446102Z }, 2025-12-04T08:58:37.7446225Z { 2025-12-04T08:58:37.7446450Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7446737Z "size": 3058011133, 2025-12-04T08:58:37.7447210Z "digest": "sha256:148171691cd4c4d20310d490d4b4dd903490d04ea07fb8f7e668a28768683e9a" 2025-12-04T08:58:37.7447536Z }, 2025-12-04T08:58:37.7447672Z { 2025-12-04T08:58:37.7447917Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7448210Z "size": 129, 2025-12-04T08:58:37.7448499Z "digest": "sha256:2c666d30ed77fff9ff1167d41cd645dad98280fcbe941f5bc3828c7ae66b1287" 2025-12-04T08:58:37.7448812Z }, 2025-12-04T08:58:37.7448936Z { 2025-12-04T08:58:37.7449153Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7449434Z "size": 880, 2025-12-04T08:58:37.7449707Z "digest": "sha256:5d8d3a0a98e012c5068e0f3bae5a03e3148ecf2d063634eee4c9241a1e3fdfb5" 2025-12-04T08:58:37.7450021Z }, 2025-12-04T08:58:37.7450150Z { 2025-12-04T08:58:37.7450405Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7450677Z "size": 724, 2025-12-04T08:58:37.7450941Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:58:37.7451242Z }, 2025-12-04T08:58:37.7451367Z { 2025-12-04T08:58:37.7451583Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7451856Z "size": 139, 2025-12-04T08:58:37.7452137Z "digest": "sha256:b06bafce9e817295d8127207747c80aa18e04392ff0875844fc30a1e794a8a0c" 2025-12-04T08:58:37.7452453Z }, 2025-12-04T08:58:37.7452572Z { 2025-12-04T08:58:37.7452790Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7453082Z "size": 32, 2025-12-04T08:58:37.7453368Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7453687Z }, 2025-12-04T08:58:37.7453813Z { 2025-12-04T08:58:37.7454030Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7454300Z "size": 159, 2025-12-04T08:58:37.7454576Z "digest": "sha256:15e0d7e4590d3d8f598d05aec3a92f891bf8b4605bcc38cc2de852b6014ef8f3" 2025-12-04T08:58:37.7454904Z }, 2025-12-04T08:58:37.7455031Z { 2025-12-04T08:58:37.7455248Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7455529Z "size": 1011, 2025-12-04T08:58:37.7455801Z "digest": "sha256:a514bd1add3164d8d7ca99aa19294c4ed8b97b074635d98714c4f598a959f4cd" 2025-12-04T08:58:37.7456125Z }, 2025-12-04T08:58:37.7456255Z { 2025-12-04T08:58:37.7456465Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7456745Z "size": 724, 2025-12-04T08:58:37.7457019Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:58:37.7457331Z }, 2025-12-04T08:58:37.7457450Z { 2025-12-04T08:58:37.7457663Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7457938Z "size": 134, 2025-12-04T08:58:37.7458207Z "digest": "sha256:57b84ee6000204f27a1d9bca199b19be4c86ecd324540dbdf239c56a6c3b34ea" 2025-12-04T08:58:37.7458524Z }, 2025-12-04T08:58:37.7458649Z { 2025-12-04T08:58:37.7458860Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7459139Z "size": 32, 2025-12-04T08:58:37.7459553Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7459865Z }, 2025-12-04T08:58:37.7459989Z { 2025-12-04T08:58:37.7460204Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7460481Z "size": 157, 2025-12-04T08:58:37.7460753Z "digest": "sha256:b8babeff6d817a5961dddc15c6bdfdbd05da187fae75d5804015f99fd7c066d8" 2025-12-04T08:58:37.7461072Z }, 2025-12-04T08:58:37.7461206Z { 2025-12-04T08:58:37.7461417Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7461693Z "size": 602, 2025-12-04T08:58:37.7461965Z "digest": "sha256:83779ddf6a85ab387f64a45f274cba245b69e4fd1931ff0b5d7d3efd4b7a43bc" 2025-12-04T08:58:37.7462276Z }, 2025-12-04T08:58:37.7462401Z { 2025-12-04T08:58:37.7462695Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7462969Z "size": 724, 2025-12-04T08:58:37.7463233Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:58:37.7463556Z }, 2025-12-04T08:58:37.7463675Z { 2025-12-04T08:58:37.7463887Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7464163Z "size": 155, 2025-12-04T08:58:37.7464432Z "digest": "sha256:8b7620c0d736cc79381207ce5afe2af90f0cd7f0cd394577d2c9520d7f74762f" 2025-12-04T08:58:37.7464757Z }, 2025-12-04T08:58:37.7464892Z { 2025-12-04T08:58:37.7465120Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7465408Z "size": 32, 2025-12-04T08:58:37.7465698Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7466026Z }, 2025-12-04T08:58:37.7466152Z { 2025-12-04T08:58:37.7466379Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7466670Z "size": 188, 2025-12-04T08:58:37.7466954Z "digest": "sha256:3bcfa090e4efd3677425f76baea9f1e0c50a75d8c6b5713ec05310f1dff24539" 2025-12-04T08:58:37.7467282Z }, 2025-12-04T08:58:37.7467412Z { 2025-12-04T08:58:37.7467629Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7467918Z "size": 1370, 2025-12-04T08:58:37.7468204Z "digest": "sha256:eb0504ec4d9218a79896b604f73dc0ea5a0f96266ad9c2cdbbbe5f0f18222694" 2025-12-04T08:58:37.7468526Z }, 2025-12-04T08:58:37.7468648Z { 2025-12-04T08:58:37.7468876Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7469161Z "size": 32, 2025-12-04T08:58:37.7469432Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7469748Z }, 2025-12-04T08:58:37.7469875Z { 2025-12-04T08:58:37.7470083Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7470361Z "size": 136, 2025-12-04T08:58:37.7470640Z "digest": "sha256:15d0fec09d7b196a1462d51516ee90fc3443ba178d3e56d59cacf32146b4321d" 2025-12-04T08:58:37.7470950Z }, 2025-12-04T08:58:37.7471076Z { 2025-12-04T08:58:37.7471289Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7471567Z "size": 528, 2025-12-04T08:58:37.7471841Z "digest": "sha256:cca81fcc62a949959ca4dd3c9056fb293d548ef8607127eeeef6cfd3a8897ca8" 2025-12-04T08:58:37.7472162Z }, 2025-12-04T08:58:37.7472288Z { 2025-12-04T08:58:37.7472496Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7472789Z "size": 32, 2025-12-04T08:58:37.7473064Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7473373Z }, 2025-12-04T08:58:37.7473509Z { 2025-12-04T08:58:37.7473729Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7474000Z "size": 104, 2025-12-04T08:58:37.7474283Z "digest": "sha256:b0b8f9b5c6ab98db9cd830dc584e1b6aec9add139e4cc48d8c243d36691e25b4" 2025-12-04T08:58:37.7474609Z }, 2025-12-04T08:58:37.7474732Z { 2025-12-04T08:58:37.7475034Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7475314Z "size": 435, 2025-12-04T08:58:37.7475581Z "digest": "sha256:0606ca4d47a8a70e91e92b03ca51a85e731641b09342136a54ef2f2a6d9dfb44" 2025-12-04T08:58:37.7475887Z }, 2025-12-04T08:58:37.7476015Z { 2025-12-04T08:58:37.7476233Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7476503Z "size": 32, 2025-12-04T08:58:37.7476779Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7477097Z }, 2025-12-04T08:58:37.7477218Z { 2025-12-04T08:58:37.7477442Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7477726Z "size": 109, 2025-12-04T08:58:37.7478089Z "digest": "sha256:2f80a4e1b3b95ed67bb781ea787e8a63e46de79117d9d8e65c257072b38afa2d" 2025-12-04T08:58:37.7478418Z }, 2025-12-04T08:58:37.7478551Z { 2025-12-04T08:58:37.7478765Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7479070Z "size": 1896, 2025-12-04T08:58:37.7479356Z "digest": "sha256:35c916fb1bd057e517dcab78c3a2a018e68096d8993892ad84f47562d37ae352" 2025-12-04T08:58:37.7479673Z }, 2025-12-04T08:58:37.7479807Z { 2025-12-04T08:58:37.7480087Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7480382Z "size": 197526165, 2025-12-04T08:58:37.7480667Z "digest": "sha256:195537b7dafc96192f768323b1a8cc2a914d41959849b73198579576b0872a44" 2025-12-04T08:58:37.7480992Z }, 2025-12-04T08:58:37.7481125Z { 2025-12-04T08:58:37.7481341Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7481623Z "size": 106, 2025-12-04T08:58:37.7481903Z "digest": "sha256:dc454fd3967e5735b2498b7f1d958a2c626987d5e4ce225ca98da3cd945b59f3" 2025-12-04T08:58:37.7482225Z }, 2025-12-04T08:58:37.7482352Z { 2025-12-04T08:58:37.7482572Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7482854Z "size": 165, 2025-12-04T08:58:37.7483124Z "digest": "sha256:701b34f115fa897181c046dc37288e87cbc3ad74c36a9e2224b5bfe7c5703afb" 2025-12-04T08:58:37.7483441Z }, 2025-12-04T08:58:37.7483566Z { 2025-12-04T08:58:37.7483788Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7484070Z "size": 7944, 2025-12-04T08:58:37.7484349Z "digest": "sha256:39cefc00ffedebc9098261c798408b87a20c95a88fccb110594077f48dadf760" 2025-12-04T08:58:37.7484660Z }, 2025-12-04T08:58:37.7484785Z { 2025-12-04T08:58:37.7485002Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7485273Z "size": 8071, 2025-12-04T08:58:37.7485545Z "digest": "sha256:6ae51eb61a325b2c2995a5088c81aa20821b75be65b5aa722c7c40556b5d03ea" 2025-12-04T08:58:37.7485856Z }, 2025-12-04T08:58:37.7485982Z { 2025-12-04T08:58:37.7486214Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7486492Z "size": 304, 2025-12-04T08:58:37.7486770Z "digest": "sha256:1fd5341e66dfc0c1ae23af014641a92a6fd02640c528fe6d4dc55921ed659a26" 2025-12-04T08:58:37.7487081Z }, 2025-12-04T08:58:37.7487206Z { 2025-12-04T08:58:37.7487423Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7487700Z "size": 13364291, 2025-12-04T08:58:37.7487991Z "digest": "sha256:72a7c87e35e40ab796f90aee1b51add7902f0cdc44406d2505b6c6a1f55a8da6" 2025-12-04T08:58:37.7488317Z }, 2025-12-04T08:58:37.7488440Z { 2025-12-04T08:58:37.7488653Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7488932Z "size": 108, 2025-12-04T08:58:37.7489204Z "digest": "sha256:ec36862ac98ebaac52ee1a8b1d162d45bd0e3bf59ae7e19c8f80ad3960b4c600" 2025-12-04T08:58:37.7489530Z }, 2025-12-04T08:58:37.7489660Z { 2025-12-04T08:58:37.7489873Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7490157Z "size": 54145699, 2025-12-04T08:58:37.7490535Z "digest": "sha256:05ddbf246e8add0e293474dbf88bb028d5a295a25ac59e8648a18db644377773" 2025-12-04T08:58:37.7490855Z }, 2025-12-04T08:58:37.7490980Z { 2025-12-04T08:58:37.7491195Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:58:37.7491476Z "size": 32, 2025-12-04T08:58:37.7491746Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:58:37.7492063Z } 2025-12-04T08:58:37.7492195Z ] 2025-12-04T08:58:37.7492318Z } 2025-12-04T08:58:37.7492477Z + exit 0 2025-12-04T08:58:37.7528789Z ##[group]Run set -eux 2025-12-04T08:58:37.7529004Z set -eux 2025-12-04T08:58:37.7529295Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T08:58:37.7530244Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T08:58:37.7538336Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:37.7538627Z env: 2025-12-04T08:58:37.7538789Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:37.7538979Z ##[endgroup] 2025-12-04T08:58:37.7577462Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T08:58:37.7578308Z + jq --raw-output .SecretString 2025-12-04T08:58:37.7579349Z + jq -r .docker_hub_readonly_token 2025-12-04T08:58:37.7580545Z + docker login --username pytorchbot --password-stdin 2025-12-04T08:58:38.2670162Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:58:38.2670839Z Configure a credential helper to remove this warning. See 2025-12-04T08:58:38.2671385Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:58:38.2671736Z 2025-12-04T08:58:38.2672048Z Login Succeeded 2025-12-04T08:58:38.2752807Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T08:58:38.2753118Z tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T08:58:38.2753436Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-12-04T08:58:38.2761282Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:38.2761559Z env: 2025-12-04T08:58:38.2761717Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:38.2762315Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:38.2762924Z ##[endgroup] 2025-12-04T08:58:38.2791014Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:38.2830066Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T08:58:38.2830417Z with: 2025-12-04T08:58:38.2830979Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:38.2831672Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:38.2831942Z env: 2025-12-04T08:58:38.2832098Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:38.2832284Z ##[endgroup] 2025-12-04T08:58:38.2845217Z ##[group]Run set -x 2025-12-04T08:58:38.2845413Z set -x 2025-12-04T08:58:38.2845566Z set +e 2025-12-04T08:58:38.2845717Z  2025-12-04T08:58:38.2845868Z login() { 2025-12-04T08:58:38.2846208Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:58:38.2846584Z } 2025-12-04T08:58:38.2846732Z  2025-12-04T08:58:38.2846917Z retry () { 2025-12-04T08:58:38.2847104Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:58:38.2847323Z } 2025-12-04T08:58:38.2847469Z  2025-12-04T08:58:38.2847636Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:58:38.2847849Z  2025-12-04T08:58:38.2848358Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T08:58:38.2848823Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T08:58:38.2849086Z  2025-12-04T08:58:38.2849234Z set -e 2025-12-04T08:58:38.2849477Z # ignore output since only exit code is used for conditional 2025-12-04T08:58:38.2849821Z # only pull docker image if it's not available locally 2025-12-04T08:58:38.2850199Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T08:58:38.2850558Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T08:58:38.2850782Z fi 2025-12-04T08:58:38.2857666Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:58:38.2857931Z env: 2025-12-04T08:58:38.2858088Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:58:38.2858665Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:38.2859335Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:38.2859609Z ##[endgroup] 2025-12-04T08:58:38.2885279Z + set +e 2025-12-04T08:58:38.2885745Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:38.2886383Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:38.2888803Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:58:38.2889962Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:58:38.7437128Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:58:38.7437830Z Configure a credential helper to remove this warning. See 2025-12-04T08:58:38.7438444Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:58:38.7438875Z 2025-12-04T08:58:38.7441833Z Login Succeeded 2025-12-04T08:58:38.7470740Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:38.7472192Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T08:58:38.9303598Z + IMAGE_SIZE=15091.581844329834 2025-12-04T08:58:38.9303973Z + echo 'Compressed size of image in MB: 15091.581844329834' 2025-12-04T08:58:38.9304331Z + set -e 2025-12-04T08:58:38.9305082Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:38.9306265Z Compressed size of image in MB: 15091.581844329834 2025-12-04T08:58:38.9422142Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:38.9423173Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:58:39.1547473Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image 2025-12-04T08:58:39.1550819Z 63e5bc7682b8: Pulling fs layer 2025-12-04T08:58:39.1551316Z 0678d56345c9: Pulling fs layer 2025-12-04T08:58:39.1551740Z 45f5c9ddfce7: Pulling fs layer 2025-12-04T08:58:39.1552248Z 086b1df51ac1: Pulling fs layer 2025-12-04T08:58:39.1552646Z fe8a7b64bf98: Pulling fs layer 2025-12-04T08:58:39.1553044Z 7680723e9a57: Pulling fs layer 2025-12-04T08:58:39.1553473Z 9c5027aeeb4e: Pulling fs layer 2025-12-04T08:58:39.1553877Z 9a5652110360: Pulling fs layer 2025-12-04T08:58:39.1554286Z 375c4427e914: Pulling fs layer 2025-12-04T08:58:39.1554660Z a86faaa7dbdd: Pulling fs layer 2025-12-04T08:58:39.1554999Z fb7848686804: Pulling fs layer 2025-12-04T08:58:39.1555329Z 3541df015cdb: Pulling fs layer 2025-12-04T08:58:39.1555942Z 79dc80f426b2: Pulling fs layer 2025-12-04T08:58:39.1556305Z a13fcc1b90bb: Pulling fs layer 2025-12-04T08:58:39.1556629Z 086b1df51ac1: Waiting 2025-12-04T08:58:39.1556942Z 4f4fb700ef54: Pulling fs layer 2025-12-04T08:58:39.1557289Z 549db4d6c618: Pulling fs layer 2025-12-04T08:58:39.1557607Z 5c63528cb580: Pulling fs layer 2025-12-04T08:58:39.1557927Z fe8a7b64bf98: Waiting 2025-12-04T08:58:39.1558215Z 75bd83b989a4: Pulling fs layer 2025-12-04T08:58:39.1558541Z 7680723e9a57: Waiting 2025-12-04T08:58:39.1558798Z 9c5027aeeb4e: Waiting 2025-12-04T08:58:39.1558969Z 3541df015cdb: Waiting 2025-12-04T08:58:39.1559130Z 9a5652110360: Waiting 2025-12-04T08:58:39.1559305Z de6e78970f51: Pulling fs layer 2025-12-04T08:58:39.1559514Z e13ed7c7e473: Pulling fs layer 2025-12-04T08:58:39.1559712Z 6e2949bcb741: Pulling fs layer 2025-12-04T08:58:39.1559976Z 14d69d9aaec7: Pulling fs layer 2025-12-04T08:58:39.1560174Z 79dc80f426b2: Waiting 2025-12-04T08:58:39.1560347Z 5c02769dd8e5: Pulling fs layer 2025-12-04T08:58:39.1560530Z a13fcc1b90bb: Waiting 2025-12-04T08:58:39.1560700Z a86faaa7dbdd: Waiting 2025-12-04T08:58:39.1560868Z 35041ce524ac: Pulling fs layer 2025-12-04T08:58:39.1561052Z 2fa92dc5885e: Pulling fs layer 2025-12-04T08:58:39.1561233Z 549db4d6c618: Waiting 2025-12-04T08:58:39.1561404Z 2b85eafbd92a: Pulling fs layer 2025-12-04T08:58:39.1561578Z 5c63528cb580: Waiting 2025-12-04T08:58:39.1561731Z fb7848686804: Waiting 2025-12-04T08:58:39.1561884Z 375c4427e914: Waiting 2025-12-04T08:58:39.1562048Z 4f4fb700ef54: Waiting 2025-12-04T08:58:39.1562200Z 5c02769dd8e5: Waiting 2025-12-04T08:58:39.1562353Z 6e2949bcb741: Waiting 2025-12-04T08:58:39.1562505Z 35041ce524ac: Waiting 2025-12-04T08:58:39.1562662Z 14d69d9aaec7: Waiting 2025-12-04T08:58:39.1562899Z ff755a4ddad7: Pulling fs layer 2025-12-04T08:58:39.1563079Z e13ed7c7e473: Waiting 2025-12-04T08:58:39.1563230Z 2fa92dc5885e: Waiting 2025-12-04T08:58:39.1563394Z 09eb41bdf42d: Pulling fs layer 2025-12-04T08:58:39.1563580Z 11ede4d59e93: Pulling fs layer 2025-12-04T08:58:39.1563759Z ff755a4ddad7: Waiting 2025-12-04T08:58:39.1563913Z 75bd83b989a4: Waiting 2025-12-04T08:58:39.1564080Z 1283cd8f801a: Pulling fs layer 2025-12-04T08:58:39.1564258Z 024fa855425f: Pulling fs layer 2025-12-04T08:58:39.1564452Z 2b85eafbd92a: Waiting 2025-12-04T08:58:39.1564610Z 024fa855425f: Waiting 2025-12-04T08:58:39.1564811Z 1283cd8f801a: Waiting 2025-12-04T08:58:39.1564972Z 303e6747a62e: Pulling fs layer 2025-12-04T08:58:39.1565155Z 3017cdf4838b: Pulling fs layer 2025-12-04T08:58:39.1565334Z 6b6cd1c358e8: Pulling fs layer 2025-12-04T08:58:39.1565509Z 303e6747a62e: Waiting 2025-12-04T08:58:39.1565675Z b2dd04501124: Pulling fs layer 2025-12-04T08:58:39.1565852Z 3017cdf4838b: Waiting 2025-12-04T08:58:39.1566209Z 6b6cd1c358e8: Waiting 2025-12-04T08:58:39.1566377Z 11ede4d59e93: Waiting 2025-12-04T08:58:39.1566540Z 55adc51fe589: Pulling fs layer 2025-12-04T08:58:39.1566730Z 09eb41bdf42d: Waiting 2025-12-04T08:58:39.1566885Z b2dd04501124: Waiting 2025-12-04T08:58:39.1567047Z a43ca0e4b837: Pulling fs layer 2025-12-04T08:58:39.1567234Z b7212f17fd14: Pulling fs layer 2025-12-04T08:58:39.1567411Z a43ca0e4b837: Waiting 2025-12-04T08:58:39.1567574Z 083e42cac090: Pulling fs layer 2025-12-04T08:58:39.1567745Z b7212f17fd14: Waiting 2025-12-04T08:58:39.1567903Z 083e42cac090: Waiting 2025-12-04T08:58:39.1568075Z 55adc51fe589: Waiting 2025-12-04T08:58:39.1568236Z 0a00b784a4aa: Pulling fs layer 2025-12-04T08:58:39.1568532Z c6173c779f7b: Pulling fs layer 2025-12-04T08:58:39.1568720Z ed3d1e3387b9: Pulling fs layer 2025-12-04T08:58:39.1568908Z b29343478586: Pulling fs layer 2025-12-04T08:58:39.1569128Z c6173c779f7b: Waiting 2025-12-04T08:58:39.1569297Z ed3d1e3387b9: Waiting 2025-12-04T08:58:39.1569510Z 0a00b784a4aa: Waiting 2025-12-04T08:58:39.1569672Z c6f0520487fb: Pulling fs layer 2025-12-04T08:58:39.1569857Z 148171691cd4: Pulling fs layer 2025-12-04T08:58:39.1570099Z 2c666d30ed77: Pulling fs layer 2025-12-04T08:58:39.1570286Z 5d8d3a0a98e0: Pulling fs layer 2025-12-04T08:58:39.1570466Z b29343478586: Waiting 2025-12-04T08:58:39.1570725Z 2c666d30ed77: Waiting 2025-12-04T08:58:39.1570890Z b06bafce9e81: Pulling fs layer 2025-12-04T08:58:39.1571086Z 5d8d3a0a98e0: Waiting 2025-12-04T08:58:39.1571251Z 148171691cd4: Waiting 2025-12-04T08:58:39.1571405Z b06bafce9e81: Waiting 2025-12-04T08:58:39.1571569Z 15e0d7e4590d: Pulling fs layer 2025-12-04T08:58:39.1571753Z c6f0520487fb: Waiting 2025-12-04T08:58:39.1571911Z a514bd1add31: Pulling fs layer 2025-12-04T08:58:39.1572096Z a514bd1add31: Waiting 2025-12-04T08:58:39.1572258Z 57b84ee60002: Pulling fs layer 2025-12-04T08:58:39.1572431Z 15e0d7e4590d: Waiting 2025-12-04T08:58:39.1572578Z 57b84ee60002: Waiting 2025-12-04T08:58:39.1572738Z b8babeff6d81: Pulling fs layer 2025-12-04T08:58:39.1572927Z 83779ddf6a85: Pulling fs layer 2025-12-04T08:58:39.1573101Z b8babeff6d81: Waiting 2025-12-04T08:58:39.1573263Z 8b7620c0d736: Pulling fs layer 2025-12-04T08:58:39.1573450Z 3bcfa090e4ef: Pulling fs layer 2025-12-04T08:58:39.1573623Z 83779ddf6a85: Waiting 2025-12-04T08:58:39.1573787Z eb0504ec4d92: Pulling fs layer 2025-12-04T08:58:39.1573972Z 8b7620c0d736: Waiting 2025-12-04T08:58:39.1574141Z 15d0fec09d7b: Pulling fs layer 2025-12-04T08:58:39.1574330Z cca81fcc62a9: Pulling fs layer 2025-12-04T08:58:39.1574511Z 3bcfa090e4ef: Waiting 2025-12-04T08:58:39.1574665Z 15d0fec09d7b: Waiting 2025-12-04T08:58:39.1574827Z eb0504ec4d92: Waiting 2025-12-04T08:58:39.1574991Z b0b8f9b5c6ab: Pulling fs layer 2025-12-04T08:58:39.1575164Z cca81fcc62a9: Waiting 2025-12-04T08:58:39.1575330Z 0606ca4d47a8: Pulling fs layer 2025-12-04T08:58:39.1575509Z b0b8f9b5c6ab: Waiting 2025-12-04T08:58:39.1575673Z 2f80a4e1b3b9: Pulling fs layer 2025-12-04T08:58:39.1575855Z 35c916fb1bd0: Pulling fs layer 2025-12-04T08:58:39.1576052Z 195537b7dafc: Pulling fs layer 2025-12-04T08:58:39.1576236Z 0606ca4d47a8: Waiting 2025-12-04T08:58:39.1576386Z 2f80a4e1b3b9: Waiting 2025-12-04T08:58:39.1576552Z dc454fd3967e: Pulling fs layer 2025-12-04T08:58:39.1576730Z 35c916fb1bd0: Waiting 2025-12-04T08:58:39.1576947Z 701b34f115fa: Pulling fs layer 2025-12-04T08:58:39.1577133Z dc454fd3967e: Waiting 2025-12-04T08:58:39.1588331Z 39cefc00ffed: Pulling fs layer 2025-12-04T08:58:39.1588560Z 701b34f115fa: Waiting 2025-12-04T08:58:39.1588759Z 6ae51eb61a32: Pulling fs layer 2025-12-04T08:58:39.1588971Z 1fd5341e66df: Pulling fs layer 2025-12-04T08:58:39.1589160Z 72a7c87e35e4: Pulling fs layer 2025-12-04T08:58:39.1589351Z ec36862ac98e: Pulling fs layer 2025-12-04T08:58:39.1589524Z 1fd5341e66df: Waiting 2025-12-04T08:58:39.1589696Z 39cefc00ffed: Waiting 2025-12-04T08:58:39.1589858Z 6ae51eb61a32: Waiting 2025-12-04T08:58:39.1590004Z 72a7c87e35e4: Waiting 2025-12-04T08:58:39.1590167Z 05ddbf246e8a: Pulling fs layer 2025-12-04T08:58:39.1590350Z ec36862ac98e: Waiting 2025-12-04T08:58:39.1590681Z 05ddbf246e8a: Waiting 2025-12-04T08:58:39.2277660Z 0678d56345c9: Verifying Checksum 2025-12-04T08:58:39.2278318Z 0678d56345c9: Download complete 2025-12-04T08:58:39.3260699Z 086b1df51ac1: Download complete 2025-12-04T08:58:39.4145522Z fe8a7b64bf98: Download complete 2025-12-04T08:58:39.4974398Z 7680723e9a57: Verifying Checksum 2025-12-04T08:58:39.4974680Z 7680723e9a57: Download complete 2025-12-04T08:58:39.5123425Z 63e5bc7682b8: Download complete 2025-12-04T08:58:39.5613885Z 9c5027aeeb4e: Verifying Checksum 2025-12-04T08:58:39.5614244Z 9c5027aeeb4e: Download complete 2025-12-04T08:58:39.6031199Z 9a5652110360: Verifying Checksum 2025-12-04T08:58:39.6031472Z 9a5652110360: Download complete 2025-12-04T08:58:39.6640839Z a86faaa7dbdd: Verifying Checksum 2025-12-04T08:58:39.6641343Z a86faaa7dbdd: Download complete 2025-12-04T08:58:39.7477925Z fb7848686804: Verifying Checksum 2025-12-04T08:58:39.7478340Z fb7848686804: Download complete 2025-12-04T08:58:39.8344466Z 3541df015cdb: Download complete 2025-12-04T08:58:39.9083486Z 79dc80f426b2: Verifying Checksum 2025-12-04T08:58:39.9083868Z 79dc80f426b2: Download complete 2025-12-04T08:58:40.3790113Z 63e5bc7682b8: Pull complete 2025-12-04T08:58:40.4009085Z 0678d56345c9: Pull complete 2025-12-04T08:58:40.7043607Z 375c4427e914: Verifying Checksum 2025-12-04T08:58:40.7044264Z 375c4427e914: Download complete 2025-12-04T08:58:40.7111367Z 4f4fb700ef54: Verifying Checksum 2025-12-04T08:58:40.7111658Z 4f4fb700ef54: Download complete 2025-12-04T08:58:40.7843256Z 549db4d6c618: Download complete 2025-12-04T08:58:40.8465371Z 5c63528cb580: Verifying Checksum 2025-12-04T08:58:40.9213050Z 5c63528cb580: Download complete 2025-12-04T08:58:40.9213428Z 75bd83b989a4: Verifying Checksum 2025-12-04T08:58:40.9213655Z 75bd83b989a4: Download complete 2025-12-04T08:58:41.0092324Z de6e78970f51: Verifying Checksum 2025-12-04T08:58:41.0092676Z de6e78970f51: Download complete 2025-12-04T08:58:41.0719420Z e13ed7c7e473: Verifying Checksum 2025-12-04T08:58:41.0719736Z e13ed7c7e473: Download complete 2025-12-04T08:58:41.1404202Z 6e2949bcb741: Download complete 2025-12-04T08:58:41.2099949Z 14d69d9aaec7: Download complete 2025-12-04T08:58:41.2953005Z 5c02769dd8e5: Download complete 2025-12-04T08:58:42.3430012Z 45f5c9ddfce7: Verifying Checksum 2025-12-04T08:58:42.3430471Z 45f5c9ddfce7: Download complete 2025-12-04T08:58:42.4204875Z 2fa92dc5885e: Verifying Checksum 2025-12-04T08:58:42.4205173Z 2fa92dc5885e: Download complete 2025-12-04T08:58:42.8284215Z 2b85eafbd92a: Verifying Checksum 2025-12-04T08:58:42.8284707Z 2b85eafbd92a: Download complete 2025-12-04T08:58:42.8892606Z ff755a4ddad7: Download complete 2025-12-04T08:58:42.9686090Z 09eb41bdf42d: Verifying Checksum 2025-12-04T08:58:42.9686615Z 09eb41bdf42d: Download complete 2025-12-04T08:58:47.6006796Z 11ede4d59e93: Verifying Checksum 2025-12-04T08:58:47.6007163Z 11ede4d59e93: Download complete 2025-12-04T08:58:47.6779977Z 1283cd8f801a: Verifying Checksum 2025-12-04T08:58:47.6780276Z 1283cd8f801a: Download complete 2025-12-04T08:58:47.7395022Z 024fa855425f: Verifying Checksum 2025-12-04T08:58:47.7395662Z 024fa855425f: Download complete 2025-12-04T08:58:47.8209123Z 303e6747a62e: Verifying Checksum 2025-12-04T08:58:47.8209577Z 303e6747a62e: Download complete 2025-12-04T08:58:47.9116810Z 3017cdf4838b: Verifying Checksum 2025-12-04T08:58:47.9117582Z 3017cdf4838b: Download complete 2025-12-04T08:58:48.1443117Z 6b6cd1c358e8: Verifying Checksum 2025-12-04T08:58:48.1443432Z 6b6cd1c358e8: Download complete 2025-12-04T08:58:48.2261239Z b2dd04501124: Verifying Checksum 2025-12-04T08:58:48.2261537Z b2dd04501124: Download complete 2025-12-04T08:58:48.2774043Z 55adc51fe589: Verifying Checksum 2025-12-04T08:58:48.2774434Z 55adc51fe589: Download complete 2025-12-04T08:58:48.3603096Z a43ca0e4b837: Verifying Checksum 2025-12-04T08:58:48.3603437Z a43ca0e4b837: Download complete 2025-12-04T08:58:48.4568859Z b7212f17fd14: Download complete 2025-12-04T08:58:48.5670077Z 083e42cac090: Download complete 2025-12-04T08:58:48.6753811Z 0a00b784a4aa: Download complete 2025-12-04T08:58:48.7632409Z c6173c779f7b: Download complete 2025-12-04T08:58:49.3466875Z 45f5c9ddfce7: Pull complete 2025-12-04T08:58:49.3683444Z 086b1df51ac1: Pull complete 2025-12-04T08:58:49.3890907Z fe8a7b64bf98: Pull complete 2025-12-04T08:58:49.4099891Z 7680723e9a57: Pull complete 2025-12-04T08:58:49.4330392Z 9c5027aeeb4e: Pull complete 2025-12-04T08:58:49.4537981Z 9a5652110360: Pull complete 2025-12-04T08:58:50.2603346Z ed3d1e3387b9: Verifying Checksum 2025-12-04T08:58:50.2603654Z ed3d1e3387b9: Download complete 2025-12-04T08:58:50.3281025Z b29343478586: Download complete 2025-12-04T08:58:51.3847447Z 375c4427e914: Pull complete 2025-12-04T08:58:51.5367216Z a86faaa7dbdd: Pull complete 2025-12-04T08:58:51.7544843Z fb7848686804: Pull complete 2025-12-04T08:58:51.9246232Z 3541df015cdb: Pull complete 2025-12-04T08:58:52.0322390Z 79dc80f426b2: Pull complete 2025-12-04T08:58:53.4892508Z c6f0520487fb: Verifying Checksum 2025-12-04T08:58:53.4892872Z c6f0520487fb: Download complete 2025-12-04T08:59:11.7949970Z a13fcc1b90bb: Verifying Checksum 2025-12-04T08:59:11.7950271Z a13fcc1b90bb: Download complete 2025-12-04T08:59:11.8758532Z 2c666d30ed77: Download complete 2025-12-04T08:59:11.9464251Z 5d8d3a0a98e0: Verifying Checksum 2025-12-04T08:59:11.9464639Z 5d8d3a0a98e0: Download complete 2025-12-04T08:59:12.0379304Z b06bafce9e81: Download complete 2025-12-04T08:59:12.1083085Z 15e0d7e4590d: Download complete 2025-12-04T08:59:12.2582182Z 57b84ee60002: Verifying Checksum 2025-12-04T08:59:12.2582573Z 57b84ee60002: Download complete 2025-12-04T08:59:12.3395244Z b8babeff6d81: Verifying Checksum 2025-12-04T08:59:12.3395570Z b8babeff6d81: Download complete 2025-12-04T08:59:12.4512261Z 83779ddf6a85: Verifying Checksum 2025-12-04T08:59:12.4512563Z 83779ddf6a85: Download complete 2025-12-04T08:59:12.5199550Z 8b7620c0d736: Verifying Checksum 2025-12-04T08:59:12.5200074Z 8b7620c0d736: Download complete 2025-12-04T08:59:12.5847812Z 3bcfa090e4ef: Verifying Checksum 2025-12-04T08:59:12.5848127Z 3bcfa090e4ef: Download complete 2025-12-04T08:59:12.6578919Z eb0504ec4d92: Download complete 2025-12-04T08:59:12.7354960Z 15d0fec09d7b: Verifying Checksum 2025-12-04T08:59:12.7355419Z 15d0fec09d7b: Download complete 2025-12-04T08:59:12.8002885Z cca81fcc62a9: Verifying Checksum 2025-12-04T08:59:12.8003333Z cca81fcc62a9: Download complete 2025-12-04T08:59:12.8967342Z b0b8f9b5c6ab: Verifying Checksum 2025-12-04T08:59:12.8967771Z b0b8f9b5c6ab: Download complete 2025-12-04T08:59:12.9666897Z 0606ca4d47a8: Download complete 2025-12-04T08:59:13.0357609Z 2f80a4e1b3b9: Download complete 2025-12-04T08:59:13.0957861Z 35c916fb1bd0: Verifying Checksum 2025-12-04T08:59:13.0958174Z 35c916fb1bd0: Download complete 2025-12-04T08:59:15.1149902Z 195537b7dafc: Verifying Checksum 2025-12-04T08:59:15.1150304Z 195537b7dafc: Download complete 2025-12-04T08:59:15.1867048Z dc454fd3967e: Verifying Checksum 2025-12-04T08:59:15.1867595Z dc454fd3967e: Download complete 2025-12-04T08:59:15.2735184Z 701b34f115fa: Verifying Checksum 2025-12-04T08:59:15.2735783Z 701b34f115fa: Download complete 2025-12-04T08:59:15.3564127Z 39cefc00ffed: Verifying Checksum 2025-12-04T08:59:15.3564707Z 39cefc00ffed: Download complete 2025-12-04T08:59:15.4412206Z 6ae51eb61a32: Verifying Checksum 2025-12-04T08:59:15.4412603Z 6ae51eb61a32: Download complete 2025-12-04T08:59:15.5316543Z 1fd5341e66df: Verifying Checksum 2025-12-04T08:59:15.5317203Z 1fd5341e66df: Download complete 2025-12-04T08:59:15.7195394Z 72a7c87e35e4: Download complete 2025-12-04T08:59:15.7876759Z ec36862ac98e: Verifying Checksum 2025-12-04T08:59:15.7877288Z ec36862ac98e: Download complete 2025-12-04T08:59:16.3794642Z 05ddbf246e8a: Verifying Checksum 2025-12-04T08:59:16.3794956Z 05ddbf246e8a: Download complete 2025-12-04T08:59:24.1228190Z 148171691cd4: Verifying Checksum 2025-12-04T08:59:24.1228624Z 148171691cd4: Download complete 2025-12-04T09:00:02.0273717Z 35041ce524ac: Verifying Checksum 2025-12-04T09:00:02.0274026Z 35041ce524ac: Download complete 2025-12-04T09:00:32.9743998Z a13fcc1b90bb: Pull complete 2025-12-04T09:00:33.1767352Z 4f4fb700ef54: Pull complete 2025-12-04T09:00:33.3351169Z 549db4d6c618: Pull complete 2025-12-04T09:00:33.4766145Z 5c63528cb580: Pull complete 2025-12-04T09:00:33.5536632Z 75bd83b989a4: Pull complete 2025-12-04T09:00:33.7630396Z de6e78970f51: Pull complete 2025-12-04T09:00:33.9828367Z e13ed7c7e473: Pull complete 2025-12-04T09:00:34.1997282Z 6e2949bcb741: Pull complete 2025-12-04T09:00:34.4059058Z 14d69d9aaec7: Pull complete 2025-12-04T09:00:34.6100826Z 5c02769dd8e5: Pull complete 2025-12-04T09:02:07.5438834Z 35041ce524ac: Pull complete 2025-12-04T09:02:07.7550595Z 2fa92dc5885e: Pull complete 2025-12-04T09:02:08.3243892Z 2b85eafbd92a: Pull complete 2025-12-04T09:02:08.5458917Z ff755a4ddad7: Pull complete 2025-12-04T09:02:08.5889901Z 09eb41bdf42d: Pull complete 2025-12-04T09:02:15.0556669Z 11ede4d59e93: Pull complete 2025-12-04T09:02:15.2714728Z 1283cd8f801a: Pull complete 2025-12-04T09:02:15.4910613Z 024fa855425f: Pull complete 2025-12-04T09:02:15.9371453Z 303e6747a62e: Pull complete 2025-12-04T09:02:16.1571346Z 3017cdf4838b: Pull complete 2025-12-04T09:02:16.5432220Z 6b6cd1c358e8: Pull complete 2025-12-04T09:02:16.7674686Z b2dd04501124: Pull complete 2025-12-04T09:02:16.9732119Z 55adc51fe589: Pull complete 2025-12-04T09:02:17.3881846Z a43ca0e4b837: Pull complete 2025-12-04T09:02:17.6029733Z b7212f17fd14: Pull complete 2025-12-04T09:02:17.8179863Z 083e42cac090: Pull complete 2025-12-04T09:02:18.2411123Z 0a00b784a4aa: Pull complete 2025-12-04T09:02:18.4620196Z c6173c779f7b: Pull complete 2025-12-04T09:02:21.2072356Z ed3d1e3387b9: Pull complete 2025-12-04T09:02:21.4280437Z b29343478586: Pull complete 2025-12-04T09:02:22.4863175Z c6f0520487fb: Pull complete 2025-12-04T09:03:06.0565910Z 148171691cd4: Pull complete 2025-12-04T09:03:06.2947333Z 2c666d30ed77: Pull complete 2025-12-04T09:03:06.4990987Z 5d8d3a0a98e0: Pull complete 2025-12-04T09:03:06.8366147Z b06bafce9e81: Pull complete 2025-12-04T09:03:07.1562059Z 15e0d7e4590d: Pull complete 2025-12-04T09:03:07.2750898Z a514bd1add31: Pull complete 2025-12-04T09:03:07.4056939Z 57b84ee60002: Pull complete 2025-12-04T09:03:07.6220025Z b8babeff6d81: Pull complete 2025-12-04T09:03:07.7959365Z 83779ddf6a85: Pull complete 2025-12-04T09:03:08.1557494Z 8b7620c0d736: Pull complete 2025-12-04T09:03:08.4596249Z 3bcfa090e4ef: Pull complete 2025-12-04T09:03:08.6464626Z eb0504ec4d92: Pull complete 2025-12-04T09:03:08.9987675Z 15d0fec09d7b: Pull complete 2025-12-04T09:03:09.2045198Z cca81fcc62a9: Pull complete 2025-12-04T09:03:09.5467074Z b0b8f9b5c6ab: Pull complete 2025-12-04T09:03:09.7625940Z 0606ca4d47a8: Pull complete 2025-12-04T09:03:10.1242307Z 2f80a4e1b3b9: Pull complete 2025-12-04T09:03:10.3285623Z 35c916fb1bd0: Pull complete 2025-12-04T09:03:15.5435198Z 195537b7dafc: Pull complete 2025-12-04T09:03:15.7492096Z dc454fd3967e: Pull complete 2025-12-04T09:03:15.9742809Z 701b34f115fa: Pull complete 2025-12-04T09:03:16.1993277Z 39cefc00ffed: Pull complete 2025-12-04T09:03:16.4171098Z 6ae51eb61a32: Pull complete 2025-12-04T09:03:16.6549940Z 1fd5341e66df: Pull complete 2025-12-04T09:03:18.0347784Z 72a7c87e35e4: Pull complete 2025-12-04T09:03:18.2387536Z ec36862ac98e: Pull complete 2025-12-04T09:03:19.4699795Z 05ddbf246e8a: Pull complete 2025-12-04T09:03:19.7312394Z Digest: sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T09:03:19.7705816Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:03:19.7894905Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:03:19.7953823Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:19.7954561Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:19.7963587Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:19.7963867Z env: 2025-12-04T09:03:19.7964029Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:19.7964211Z ##[endgroup] 2025-12-04T09:03:19.8101813Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2025-12-04T09:03:19.8102156Z with: 2025-12-04T09:03:19.8102346Z driver-version: 580.82.07 2025-12-04T09:03:19.8102552Z env: 2025-12-04T09:03:19.8102720Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:19.8102913Z ##[endgroup] 2025-12-04T09:03:19.8199522Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:19.8200295Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:19.8207647Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:19.8207925Z env: 2025-12-04T09:03:19.8208088Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:19.8208269Z ##[endgroup] 2025-12-04T09:03:19.8317286Z ##[group]Run set -euo pipefail 2025-12-04T09:03:19.8317639Z set -euo pipefail 2025-12-04T09:03:19.8317903Z  2025-12-04T09:03:19.8318092Z has_gpu=false 2025-12-04T09:03:19.8318319Z devices="" 2025-12-04T09:03:19.8318804Z  2025-12-04T09:03:19.8319052Z if command -v nvidia-smi >/dev/null 2>&1; then 2025-12-04T09:03:19.8319469Z  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:19.8319808Z  has_gpu=true 2025-12-04T09:03:19.8320159Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:19.8320432Z  fi 2025-12-04T09:03:19.8320625Z fi 2025-12-04T09:03:19.8320778Z  2025-12-04T09:03:19.8320933Z if [ "$has_gpu" = false ]; then 2025-12-04T09:03:19.8321203Z  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:19.8321469Z  has_gpu=true 2025-12-04T09:03:19.8321669Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:19.8321881Z  fi 2025-12-04T09:03:19.8322026Z fi 2025-12-04T09:03:19.8322177Z  2025-12-04T09:03:19.8322407Z if [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then 2025-12-04T09:03:19.8322777Z  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:19.8323063Z  has_gpu=true 2025-12-04T09:03:19.8323264Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:19.8323482Z  fi 2025-12-04T09:03:19.8323630Z fi 2025-12-04T09:03:19.8323767Z  2025-12-04T09:03:19.8323977Z printf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:19.8324347Z printf 'DETECTED_DEVICES<> "$GITHUB_OUTPUT" 2025-12-04T09:03:19.8331362Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:19.8331630Z env: 2025-12-04T09:03:19.8331784Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:19.8331967Z ##[endgroup] 2025-12-04T09:03:21.4649636Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:03:21.4650026Z if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:03:21.4650368Z  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}" 2025-12-04T09:03:21.4650847Z  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-12-04T09:03:21.4651193Z else 2025-12-04T09:03:21.4651393Z  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}" 2025-12-04T09:03:21.4651647Z fi 2025-12-04T09:03:21.4659590Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:21.4660045Z env: 2025-12-04T09:03:21.4660226Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:21.4660418Z HAS_NVIDIA: true 2025-12-04T09:03:21.4660583Z ##[endgroup] 2025-12-04T09:03:21.4740019Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2025-12-04T09:03:21.4740323Z with: 2025-12-04T09:03:21.4740473Z timeout_minutes: 10 2025-12-04T09:03:21.4740649Z max_attempts: 3 2025-12-04T09:03:21.4759686Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo" else # Amazon Linux 2 YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" fi sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y \ nvidia-container-toolkit-1.17.8 \ libnvidia-container-tools-1.17.8 \ libnvidia-container1-1.17.8 \ nvidia-container-toolkit-base-1.17.8 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x # Install nvidia-driver package if not installed status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)" if [ ! $? = 0 ] || [ ! "$status" = installed ]; then sudo apt-get install -y nvidia-container-toolkit-1.17.8 sudo systemctl restart docker fi ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" # Turn off persistent mode so that the installation script can unload the kernel module sudo killall nvidia-persistenced || true else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in # the case where the driver has already crashed as it still can get the driver version # and some basic information like the bus ID. However, the rest of the information # would be missing (ERR!), for example: # # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # | | | MIG M. | # |===============================+======================+======================| # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | # | | | ERR! | # +-------------------------------+----------------------+----------------------+ # # +-----------------------------------------------------------------------------+ # | Processes: | # | GPU GI CI PID Type Process name GPU Memory | # | ID ID Usage | # |=============================================================================| # +-----------------------------------------------------------------------------+ # # This should be reported as a failure instead as it will guarantee to fail when # Docker tries to run with --gpus all # # So, the correct check here is to query one of the missing piece of info like # GPU name, so that the command can fail accordingly nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with # more than one GPUs. This just needs to be run once. The command fails # on subsequent runs and complains that the mode is already on, but that's # ok sudo nvidia-persistenced || true # This should show persistence mode ON nvidia-smi # check if the container-toolkit is correctly installed and CUDA is available inside a container docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi 2025-12-04T09:03:21.4778737Z retry_wait_seconds: 10 2025-12-04T09:03:21.4778934Z polling_interval_seconds: 1 2025-12-04T09:03:21.4779137Z warning_on_retry: true 2025-12-04T09:03:21.4779330Z continue_on_error: false 2025-12-04T09:03:21.4779509Z env: 2025-12-04T09:03:21.4779654Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:21.4779842Z HAS_NVIDIA_GPU: true 2025-12-04T09:03:21.4780062Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:03:21.4780317Z DRIVER_VERSION: 580.82.07 2025-12-04T09:03:21.4780509Z ##[endgroup] 2025-12-04T09:03:21.5471290Z == Installing nvidia driver NVIDIA-Linux-x86_64-580.82.07.run == 2025-12-04T09:03:21.5473076Z + pre_install_nvidia_driver_amzn2 2025-12-04T09:03:21.5474535Z + sudo yum remove -y nvidia-driver-latest-dkms 2025-12-04T09:03:22.1240532Z No match for argument: nvidia-driver-latest-dkms 2025-12-04T09:03:22.1241343Z No packages marked for removal. 2025-12-04T09:03:22.1296202Z Dependencies resolved. 2025-12-04T09:03:22.1304756Z Nothing to do. 2025-12-04T09:03:22.1305743Z Complete! 2025-12-04T09:03:22.2014362Z + install_nvidia_driver_common 2025-12-04T09:03:22.2020399Z + echo 'Before installing NVIDIA driver' 2025-12-04T09:03:22.2022141Z + lspci 2025-12-04T09:03:22.2024696Z Before installing NVIDIA driver 2025-12-04T09:03:22.2825371Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:03:22.2826213Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:03:22.2826871Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:03:22.2827542Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:03:22.2828422Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:03:22.2828898Z 01:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2829168Z 02:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2829646Z 03:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2830109Z 03:00.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2830575Z 03:00.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2831273Z 03:00.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2831562Z 03:00.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2831818Z 03:00.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2832070Z 03:00.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2832312Z 03:00.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2832551Z 03:01.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2832785Z 03:01.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2833023Z 03:01.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2833274Z 03:01.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2833515Z 03:01.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2833749Z 03:01.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2833985Z 03:01.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2834223Z 03:01.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2834462Z 03:02.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2834708Z 03:02.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2835101Z 03:02.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2835732Z 03:02.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2836109Z 03:02.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2836461Z 03:02.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2836915Z 03:02.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2837238Z 03:02.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2837480Z 03:03.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2837721Z 03:03.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2837958Z 03:03.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2838197Z 03:03.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2838438Z 03:03.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2838681Z 03:03.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2838921Z 03:03.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2839163Z 03:03.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2839408Z 24:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2839654Z 25:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2840034Z 26:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2840504Z 26:00.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2840956Z 26:00.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2841407Z 26:00.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2841868Z 26:00.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2842211Z 26:00.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2842460Z 26:00.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2842704Z 26:00.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2842951Z 26:01.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2843291Z 27:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:03:22.2843839Z 30:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2844282Z 31:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2844647Z 32:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2844976Z 33:00.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:03:22.2845310Z 34:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:22.2845600Z 35:00.0 3D controller: NVIDIA Corporation AD104GL [L4] (rev a1) 2025-12-04T09:03:22.2846035Z + lsmod 2025-12-04T09:03:22.2870261Z Module Size Used by 2025-12-04T09:03:22.2870585Z nvidia_uvm 1925120 0 2025-12-04T09:03:22.2870866Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:03:22.2871375Z drm 602112 1 nvidia 2025-12-04T09:03:22.2871910Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:03:22.2872465Z backlight 24576 1 drm 2025-12-04T09:03:22.2872775Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:03:22.2873053Z xt_conntrack 16384 1 2025-12-04T09:03:22.2873302Z nft_chain_nat 16384 3 2025-12-04T09:03:22.2873528Z xt_MASQUERADE 20480 1 2025-12-04T09:03:22.2873808Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:03:22.2874118Z nf_conntrack_netlink 57344 0 2025-12-04T09:03:22.2874482Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:03:22.2874891Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:03:22.2875172Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:03:22.2875436Z xfrm_user 57344 1 2025-12-04T09:03:22.2875699Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:03:22.2875968Z xt_addrtype 16384 2 2025-12-04T09:03:22.2876190Z nft_compat 20480 4 2025-12-04T09:03:22.2876469Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:03:22.2876857Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:03:22.2877328Z br_netfilter 36864 0 2025-12-04T09:03:22.2877578Z bridge 323584 1 br_netfilter 2025-12-04T09:03:22.2877851Z stp 16384 1 bridge 2025-12-04T09:03:22.2878114Z llc 16384 2 bridge,stp 2025-12-04T09:03:22.2878369Z overlay 167936 0 2025-12-04T09:03:22.2878589Z tls 139264 0 2025-12-04T09:03:22.2878805Z nls_ascii 16384 1 2025-12-04T09:03:22.2879022Z nls_cp437 20480 1 2025-12-04T09:03:22.2879239Z vfat 24576 1 2025-12-04T09:03:22.2879461Z fat 86016 1 vfat 2025-12-04T09:03:22.2879694Z sunrpc 700416 1 2025-12-04T09:03:22.2880030Z ena 184320 0 2025-12-04T09:03:22.2880259Z i8042 45056 0 2025-12-04T09:03:22.2880491Z serio 28672 3 i8042 2025-12-04T09:03:22.2880733Z ghash_clmulni_intel 16384 0 2025-12-04T09:03:22.2880969Z button 24576 0 2025-12-04T09:03:22.2881193Z sch_fq_codel 20480 9 2025-12-04T09:03:22.2881419Z dm_mod 188416 0 2025-12-04T09:03:22.2881614Z fuse 184320 1 2025-12-04T09:03:22.2881794Z loop 36864 0 2025-12-04T09:03:22.2881968Z configfs 57344 1 2025-12-04T09:03:22.2882150Z dmi_sysfs 20480 0 2025-12-04T09:03:22.2882332Z crc32_pclmul 16384 0 2025-12-04T09:03:22.2882509Z crc32c_intel 24576 0 2025-12-04T09:03:22.2882694Z efivarfs 24576 1 2025-12-04T09:03:22.2882884Z + modinfo nvidia 2025-12-04T09:03:22.2888734Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:03:22.2889144Z import_ns: DMA_BUF 2025-12-04T09:03:22.2889356Z alias: char-major-195-* 2025-12-04T09:03:22.2889729Z version: 580.82.07 2025-12-04T09:03:22.2890076Z supported: external 2025-12-04T09:03:22.2890395Z license: Dual MIT/GPL 2025-12-04T09:03:22.2890784Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:03:22.2891256Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:03:22.2891658Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:03:22.2891933Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:03:22.2892206Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:03:22.2892472Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:03:22.2892740Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:03:22.2892998Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:03:22.2893320Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:03:22.2893778Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:03:22.2894222Z depends: i2c-core,drm 2025-12-04T09:03:22.2894520Z retpoline: Y 2025-12-04T09:03:22.2894832Z name: nvidia 2025-12-04T09:03:22.2895127Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:03:22.2895492Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:03:22.2895830Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:03:22.2896142Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:03:22.2896370Z parm: NVreg_RmLogonRC:int 2025-12-04T09:03:22.2896593Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:03:22.2896818Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:03:22.2897036Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:03:22.2897258Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:03:22.2897523Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:03:22.2897840Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:03:22.2898316Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:03:22.2898768Z parm: NVreg_EnableMSI:int 2025-12-04T09:03:22.2899198Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:03:22.2899714Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:03:22.2900172Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:03:22.2900460Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:03:22.2900765Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:22.2901068Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:03:22.2901373Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:22.2901781Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:03:22.2902248Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:03:22.2902738Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:03:22.2903035Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:03:22.2903286Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:03:22.2903530Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:03:22.2903781Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:03:22.2904023Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:03:22.2904262Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:03:22.2904514Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:03:22.2904780Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:03:22.2905044Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:03:22.2905303Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:03:22.2905545Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:03:22.2905798Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:03:22.2906049Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:03:22.2906299Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:03:22.2906547Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:03:22.2906789Z parm: NVreg_RmMsg:charp 2025-12-04T09:03:22.2907003Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:03:22.2907255Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:03:22.2907497Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:03:22.2907720Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:03:22.2907957Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:03:22.2908216Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:03:22.2908467Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:03:22.2908700Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:03:22.2908949Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:03:22.2909189Z parm: rm_firmware_active:charp 2025-12-04T09:03:22.2909420Z + HAS_NVIDIA_DRIVER=0 2025-12-04T09:03:22.2909605Z ++ command -v nvidia-smi 2025-12-04T09:03:22.2909799Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:03:22.2909983Z + set +e 2025-12-04T09:03:22.2910344Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:03:23.9130812Z + INSTALLED_DRIVER_VERSION=580.82.07 2025-12-04T09:03:23.9131128Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:03:23.9132097Z + '[' 0 -ne 0 ']' 2025-12-04T09:03:23.9132275Z + '[' 580.82.07 '!=' 580.82.07 ']' 2025-12-04T09:03:23.9132477Z + HAS_NVIDIA_DRIVER=1 2025-12-04T09:03:23.9132825Z + echo 'NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation' 2025-12-04T09:03:23.9133184Z + set -e 2025-12-04T09:03:23.9133331Z + '[' 1 -eq 0 ']' 2025-12-04T09:03:23.9133638Z NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation 2025-12-04T09:03:23.9134866Z + post_install_nvidia_driver_common 2025-12-04T09:03:23.9138199Z + sudo modprobe nvidia 2025-12-04T09:03:24.0406134Z + echo 'After installing NVIDIA driver' 2025-12-04T09:03:24.0406597Z + lspci 2025-12-04T09:03:24.0406810Z After installing NVIDIA driver 2025-12-04T09:03:24.0570725Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:03:24.0571287Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:03:24.0571830Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:03:24.0572638Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:03:24.0573092Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:03:24.0573517Z 01:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0573837Z 02:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0574279Z 03:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0574635Z 03:00.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0574931Z 03:00.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0575218Z 03:00.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0575515Z 03:00.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0575807Z 03:00.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0576098Z 03:00.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0576391Z 03:00.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0576690Z 03:01.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0576976Z 03:01.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0577285Z 03:01.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0577582Z 03:01.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0577870Z 03:01.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0578154Z 03:01.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0578444Z 03:01.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0578732Z 03:01.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0579021Z 03:02.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0579310Z 03:02.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0579603Z 03:02.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0579896Z 03:02.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0580192Z 03:02.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0580506Z 03:02.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0580798Z 03:02.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0581092Z 03:02.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0581387Z 03:03.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0581685Z 03:03.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0581943Z 03:03.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0582179Z 03:03.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0582430Z 03:03.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0582677Z 03:03.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0582921Z 03:03.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0583159Z 03:03.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0583604Z 24:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0583858Z 25:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0584108Z 26:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0584350Z 26:00.1 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0584587Z 26:00.2 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0584830Z 26:00.3 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0585076Z 26:00.4 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0585310Z 26:00.5 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0585551Z 26:00.6 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0585797Z 26:00.7 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0586039Z 26:01.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0586352Z 27:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:03:24.0586676Z 30:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0586922Z 31:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0587159Z 32:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0587565Z 33:00.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:03:24.0587889Z 34:00.0 PCI bridge: Amazon.com, Inc. Device 0200 2025-12-04T09:03:24.0588167Z 35:00.0 3D controller: NVIDIA Corporation AD104GL [L4] (rev a1) 2025-12-04T09:03:24.0588428Z + lsmod 2025-12-04T09:03:24.0603159Z Module Size Used by 2025-12-04T09:03:24.0603579Z nvidia_uvm 1925120 0 2025-12-04T09:03:24.0603838Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:03:24.0604107Z drm 602112 1 nvidia 2025-12-04T09:03:24.0604386Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:03:24.0604720Z backlight 24576 1 drm 2025-12-04T09:03:24.0605077Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:03:24.0605348Z xt_conntrack 16384 1 2025-12-04T09:03:24.0605590Z nft_chain_nat 16384 3 2025-12-04T09:03:24.0605821Z xt_MASQUERADE 20480 1 2025-12-04T09:03:24.0606088Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:03:24.0606390Z nf_conntrack_netlink 57344 0 2025-12-04T09:03:24.0606869Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:03:24.0607282Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:03:24.0607562Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:03:24.0607814Z xfrm_user 57344 1 2025-12-04T09:03:24.0608075Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:03:24.0608336Z xt_addrtype 16384 2 2025-12-04T09:03:24.0608556Z nft_compat 20480 4 2025-12-04T09:03:24.0608828Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:03:24.0609212Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:03:24.0609554Z br_netfilter 36864 0 2025-12-04T09:03:24.0609799Z bridge 323584 1 br_netfilter 2025-12-04T09:03:24.0610069Z stp 16384 1 bridge 2025-12-04T09:03:24.0610331Z llc 16384 2 bridge,stp 2025-12-04T09:03:24.0610577Z overlay 167936 0 2025-12-04T09:03:24.0610797Z tls 139264 0 2025-12-04T09:03:24.0611019Z nls_ascii 16384 1 2025-12-04T09:03:24.0611232Z nls_cp437 20480 1 2025-12-04T09:03:24.0611446Z vfat 24576 1 2025-12-04T09:03:24.0611670Z fat 86016 1 vfat 2025-12-04T09:03:24.0611909Z sunrpc 700416 1 2025-12-04T09:03:24.0612087Z ena 184320 0 2025-12-04T09:03:24.0612260Z i8042 45056 0 2025-12-04T09:03:24.0612444Z serio 28672 3 i8042 2025-12-04T09:03:24.0612638Z ghash_clmulni_intel 16384 0 2025-12-04T09:03:24.0612828Z button 24576 0 2025-12-04T09:03:24.0613008Z sch_fq_codel 20480 9 2025-12-04T09:03:24.0613302Z dm_mod 188416 0 2025-12-04T09:03:24.0613484Z fuse 184320 1 2025-12-04T09:03:24.0613668Z loop 36864 0 2025-12-04T09:03:24.0613845Z configfs 57344 1 2025-12-04T09:03:24.0614024Z dmi_sysfs 20480 0 2025-12-04T09:03:24.0614202Z crc32_pclmul 16384 0 2025-12-04T09:03:24.0614374Z crc32c_intel 24576 0 2025-12-04T09:03:24.0614553Z efivarfs 24576 1 2025-12-04T09:03:24.0614729Z + modinfo nvidia 2025-12-04T09:03:24.0620299Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:03:24.0620680Z import_ns: DMA_BUF 2025-12-04T09:03:24.0620867Z alias: char-major-195-* 2025-12-04T09:03:24.0621067Z version: 580.82.07 2025-12-04T09:03:24.0621251Z supported: external 2025-12-04T09:03:24.0621437Z license: Dual MIT/GPL 2025-12-04T09:03:24.0621668Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:03:24.0621980Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:03:24.0622269Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:03:24.0622777Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:03:24.0623125Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:03:24.0623446Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:03:24.0623756Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:03:24.0624059Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:03:24.0624480Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:03:24.0624925Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:03:24.0625211Z depends: i2c-core,drm 2025-12-04T09:03:24.0625451Z retpoline: Y 2025-12-04T09:03:24.0625654Z name: nvidia 2025-12-04T09:03:24.0625999Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:03:24.0626442Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:03:24.0626867Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:03:24.0627250Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:03:24.0627547Z parm: NVreg_RmLogonRC:int 2025-12-04T09:03:24.0627814Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:03:24.0628098Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:03:24.0628382Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:03:24.0628648Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:03:24.0628973Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:03:24.0629324Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:03:24.0629621Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:03:24.0629880Z parm: NVreg_EnableMSI:int 2025-12-04T09:03:24.0630158Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:03:24.0630490Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:03:24.0630862Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:03:24.0631235Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:03:24.0631708Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:24.0632019Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:03:24.0632327Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:24.0632635Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:03:24.0632893Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:03:24.0633198Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:03:24.0633488Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:03:24.0633743Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:03:24.0633980Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:03:24.0634214Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:03:24.0634449Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:03:24.0634851Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:03:24.0635106Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:03:24.0635376Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:03:24.0635636Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:03:24.0635891Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:03:24.0636153Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:03:24.0636407Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:03:24.0636656Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:03:24.0636901Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:03:24.0637148Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:03:24.0637388Z parm: NVreg_RmMsg:charp 2025-12-04T09:03:24.0637593Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:03:24.0637843Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:03:24.0638082Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:03:24.0638299Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:03:24.0638535Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:03:24.0638923Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:03:24.0639171Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:03:24.0639403Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:03:24.0639652Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:03:24.0640010Z parm: rm_firmware_active:charp 2025-12-04T09:03:24.0640212Z + set +e 2025-12-04T09:03:24.0640358Z + nvidia-smi 2025-12-04T09:03:25.4899249Z Thu Dec 4 09:03:25 2025 2025-12-04T09:03:25.4900013Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:25.4900976Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:03:25.4901936Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:25.4903294Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:03:25.4904425Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:03:25.4905290Z | | | MIG M. | 2025-12-04T09:03:25.4905948Z |=========================================+========================+======================| 2025-12-04T09:03:25.4969029Z | 0 NVIDIA L4 Off | 00000000:35:00.0 Off | 0 | 2025-12-04T09:03:25.4969450Z | N/A 35C P0 30W / 72W | 0MiB / 23034MiB | 4% Default | 2025-12-04T09:03:25.4969800Z | | | N/A | 2025-12-04T09:03:25.4970156Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:25.4970650Z 2025-12-04T09:03:25.4970835Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:25.4971228Z | Processes: | 2025-12-04T09:03:25.4971662Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:03:25.4972032Z | ID ID Usage | 2025-12-04T09:03:25.4972327Z |=========================================================================================| 2025-12-04T09:03:25.4974218Z | No running processes found | 2025-12-04T09:03:25.4974659Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:25.8213139Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T09:03:27.2458346Z NVIDIA L4 2025-12-04T09:03:27.4269878Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:03:27.4270425Z + '[' 0 -eq 0 ']' 2025-12-04T09:03:27.4270640Z + echo 'INFO: Ignoring allowed status 0' 2025-12-04T09:03:27.4270859Z + set -e 2025-12-04T09:03:27.4271037Z INFO: Ignoring allowed status 0 2025-12-04T09:03:27.4277875Z == Installing nvidia container toolkit for amzn2023 == 2025-12-04T09:03:27.4282076Z + sudo yum install -y yum-utils 2025-12-04T09:03:27.8082834Z Last metadata expiration check: 0:07:57 ago on Thu Dec 4 08:55:30 2025. 2025-12-04T09:03:27.8315376Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed. 2025-12-04T09:03:27.8743182Z Dependencies resolved. 2025-12-04T09:03:27.8972948Z Nothing to do. 2025-12-04T09:03:27.8973291Z Complete! 2025-12-04T09:03:27.9846329Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]] 2025-12-04T09:03:27.9846952Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:27.9847864Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:28.3546519Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:28.3945932Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8 2025-12-04T09:03:28.9132584Z nvidia-container-toolkit 24 kB/s | 833 B 00:00 2025-12-04T09:03:28.9792047Z Dependencies resolved. 2025-12-04T09:03:29.0017635Z ================================================================================ 2025-12-04T09:03:29.0018093Z Package Arch Version Repository Size 2025-12-04T09:03:29.0018464Z ================================================================================ 2025-12-04T09:03:29.0018755Z Downgrading: 2025-12-04T09:03:29.0019116Z libnvidia-container-tools x86_64 1.17.8-1 nvidia-container-toolkit 40 k 2025-12-04T09:03:29.0019713Z libnvidia-container1 x86_64 1.17.8-1 nvidia-container-toolkit 1.0 M 2025-12-04T09:03:29.0020244Z nvidia-container-toolkit x86_64 1.17.8-1 nvidia-container-toolkit 1.2 M 2025-12-04T09:03:29.0020787Z nvidia-container-toolkit-base x86_64 1.17.8-1 nvidia-container-toolkit 5.8 M 2025-12-04T09:03:29.0021118Z 2025-12-04T09:03:29.0021209Z Transaction Summary 2025-12-04T09:03:29.0021447Z ================================================================================ 2025-12-04T09:03:29.0021732Z Downgrade 4 Packages 2025-12-04T09:03:29.0021871Z 2025-12-04T09:03:29.0021972Z Total download size: 8.0 M 2025-12-04T09:03:29.0023517Z Downloading Packages: 2025-12-04T09:03:29.1078887Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 387 kB/s | 40 kB 00:00 2025-12-04T09:03:29.1555400Z (2/4): nvidia-container-toolkit-1.17.8-1.x86_64 8.2 MB/s | 1.2 MB 00:00 2025-12-04T09:03:29.2010690Z (3/4): libnvidia-container1-1.17.8-1.x86_64.rpm 5.0 MB/s | 1.0 MB 00:00 2025-12-04T09:03:29.2728036Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x 35 MB/s | 5.8 MB 00:00 2025-12-04T09:03:29.2736375Z -------------------------------------------------------------------------------- 2025-12-04T09:03:29.2739078Z Total 30 MB/s | 8.0 MB 00:00 2025-12-04T09:03:29.2741659Z Running transaction check 2025-12-04T09:03:29.2879417Z Transaction check succeeded. 2025-12-04T09:03:29.2880313Z Running transaction test 2025-12-04T09:03:29.3309170Z Transaction test succeeded. 2025-12-04T09:03:29.3312202Z Running transaction 2025-12-04T09:03:29.9209024Z Preparing : 1/1 2025-12-04T09:03:30.0557383Z Downgrading : nvidia-container-toolkit-base-1.17.8-1.x86_64 1/8 2025-12-04T09:03:30.0809955Z Downgrading : libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:03:30.1328577Z Running scriptlet: libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:03:30.2273127Z Downgrading : libnvidia-container-tools-1.17.8-1.x86_64 3/8 2025-12-04T09:03:30.2684423Z Downgrading : nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:03:30.3469468Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:03:30.3526455Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:30.3527655Z Cleanup : nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:30.3861542Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:30.3912557Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:30.3913931Z Cleanup : libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:30.4230309Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:30.4293466Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:30.4294470Z Cleanup : libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:30.4613118Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:30.4667645Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:30.4668677Z Cleanup : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:30.4927205Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:30.5359189Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 8/8 2025-12-04T09:04:16.7571633Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:04:16.7574101Z Verifying : libnvidia-container-tools-1.17.8-1.x86_64 1/8 2025-12-04T09:04:16.7574752Z Verifying : libnvidia-container-tools-1.18.1-1.x86_64 2/8 2025-12-04T09:04:16.7575278Z Verifying : libnvidia-container1-1.17.8-1.x86_64 3/8 2025-12-04T09:04:16.7575779Z Verifying : libnvidia-container1-1.18.1-1.x86_64 4/8 2025-12-04T09:04:16.7576289Z Verifying : nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-12-04T09:04:16.7576703Z Verifying : nvidia-container-toolkit-1.18.1-1.x86_64 6/8 2025-12-04T09:04:16.7577090Z Verifying : nvidia-container-toolkit-base-1.17.8-1.x86_64 7/8 2025-12-04T09:04:16.8958425Z Verifying : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8================================================================================ 2025-12-04T09:04:16.8958974Z WARNING: 2025-12-04T09:04:16.8959212Z A newer release of "Amazon Linux" is available. 2025-12-04T09:04:16.8959435Z 2025-12-04T09:04:16.8959525Z Available Versions: 2025-12-04T09:04:16.8959667Z 2025-12-04T09:04:16.8959767Z Version 2023.9.20250929: 2025-12-04T09:04:16.8960169Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:04:16.8960415Z 2025-12-04T09:04:16.8960550Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:04:16.8960752Z 2025-12-04T09:04:16.8960842Z Release notes: 2025-12-04T09:04:16.8961262Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:04:16.8961628Z 2025-12-04T09:04:16.8961711Z Version 2023.9.20251014: 2025-12-04T09:04:16.8962001Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:04:16.8962232Z 2025-12-04T09:04:16.8962340Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:04:16.8962542Z 2025-12-04T09:04:16.8962618Z Release notes: 2025-12-04T09:04:16.8962979Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:04:16.8963328Z 2025-12-04T09:04:16.8963418Z Version 2023.9.20251020: 2025-12-04T09:04:16.8963965Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:04:16.8964221Z 2025-12-04T09:04:16.8964335Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:04:16.8964556Z 2025-12-04T09:04:16.8964645Z Release notes: 2025-12-04T09:04:16.8965032Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:04:16.8965372Z 2025-12-04T09:04:16.8965453Z Version 2023.9.20251027: 2025-12-04T09:04:16.8965742Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:04:16.8965970Z 2025-12-04T09:04:16.8966081Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:04:16.8966280Z 2025-12-04T09:04:16.8966346Z Release notes: 2025-12-04T09:04:16.8966643Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:04:16.8966923Z 2025-12-04T09:04:16.8966989Z Version 2023.9.20251105: 2025-12-04T09:04:16.8967220Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:04:16.8967407Z 2025-12-04T09:04:16.8967499Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:04:16.8967661Z 2025-12-04T09:04:16.8967725Z Release notes: 2025-12-04T09:04:16.8968194Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:04:16.8968463Z 2025-12-04T09:04:16.8968535Z Version 2023.9.20251110: 2025-12-04T09:04:16.8968749Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:04:16.8968938Z 2025-12-04T09:04:16.8969022Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:04:16.8969169Z 2025-12-04T09:04:16.8969239Z Release notes: 2025-12-04T09:04:16.8969520Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:04:16.8969801Z 2025-12-04T09:04:16.8969865Z Version 2023.9.20251117: 2025-12-04T09:04:16.8970085Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:04:16.8970263Z 2025-12-04T09:04:16.8970351Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:04:16.8970505Z 2025-12-04T09:04:16.8970565Z Release notes: 2025-12-04T09:04:16.8970852Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:04:16.8971124Z 2025-12-04T09:04:16.8971218Z ================================================================================ 2025-12-04T09:04:16.9422958Z 2025-12-04T09:04:16.9423081Z 2025-12-04T09:04:16.9423160Z Downgraded: 2025-12-04T09:04:16.9423524Z libnvidia-container-tools-1.17.8-1.x86_64 2025-12-04T09:04:16.9424050Z libnvidia-container1-1.17.8-1.x86_64 2025-12-04T09:04:16.9424545Z nvidia-container-toolkit-1.17.8-1.x86_64 2025-12-04T09:04:16.9425070Z nvidia-container-toolkit-base-1.17.8-1.x86_64 2025-12-04T09:04:16.9425393Z 2025-12-04T09:04:16.9425484Z Complete! 2025-12-04T09:04:17.0135681Z + sudo systemctl restart docker 2025-12-04T09:04:22.3383785Z Thu Dec 4 09:04:22 2025 2025-12-04T09:04:22.3384240Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:22.3384765Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:04:22.3385218Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:22.3385682Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:04:22.3386208Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:04:22.3386597Z | | | MIG M. | 2025-12-04T09:04:22.3386889Z |=========================================+========================+======================| 2025-12-04T09:04:22.3455623Z | 0 NVIDIA L4 On | 00000000:35:00.0 Off | 0 | 2025-12-04T09:04:22.3456406Z | N/A 35C P0 30W / 72W | 0MiB / 23034MiB | 4% Default | 2025-12-04T09:04:22.3456795Z | | | N/A | 2025-12-04T09:04:22.3457171Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:22.3457440Z 2025-12-04T09:04:22.3457598Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:22.3457987Z | Processes: | 2025-12-04T09:04:22.3458407Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:04:22.3458787Z | ID ID Usage | 2025-12-04T09:04:22.3459086Z |=========================================================================================| 2025-12-04T09:04:22.3460286Z | No running processes found | 2025-12-04T09:04:22.3460718Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:22.5054171Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally 2025-12-04T09:04:22.6988049Z 3.13: Pulling from docker/library/python 2025-12-04T09:04:22.7759308Z 53c88f1dfeb7: Pulling fs layer 2025-12-04T09:04:22.7759570Z eae668646f44: Pulling fs layer 2025-12-04T09:04:22.7760116Z ff2e6e687b6c: Pulling fs layer 2025-12-04T09:04:22.7760330Z 7c40a3faff76: Pulling fs layer 2025-12-04T09:04:22.7760530Z 967a3b1c8fef: Pulling fs layer 2025-12-04T09:04:22.7760712Z a64e1a44f22a: Pulling fs layer 2025-12-04T09:04:22.7760913Z 52655f8a5bcc: Pulling fs layer 2025-12-04T09:04:22.7761098Z 52655f8a5bcc: Waiting 2025-12-04T09:04:22.7762497Z a64e1a44f22a: Waiting 2025-12-04T09:04:22.7762775Z 967a3b1c8fef: Waiting 2025-12-04T09:04:22.7763067Z 7c40a3faff76: Waiting 2025-12-04T09:04:22.9102802Z eae668646f44: Verifying Checksum 2025-12-04T09:04:22.9103184Z eae668646f44: Download complete 2025-12-04T09:04:22.9632502Z 53c88f1dfeb7: Verifying Checksum 2025-12-04T09:04:22.9633998Z 53c88f1dfeb7: Download complete 2025-12-04T09:04:22.9747297Z ff2e6e687b6c: Verifying Checksum 2025-12-04T09:04:22.9748579Z ff2e6e687b6c: Download complete 2025-12-04T09:04:23.0123901Z 967a3b1c8fef: Verifying Checksum 2025-12-04T09:04:23.0124385Z 967a3b1c8fef: Download complete 2025-12-04T09:04:23.0625284Z 52655f8a5bcc: Verifying Checksum 2025-12-04T09:04:23.0625589Z 52655f8a5bcc: Download complete 2025-12-04T09:04:23.0898157Z a64e1a44f22a: Verifying Checksum 2025-12-04T09:04:23.0898521Z a64e1a44f22a: Download complete 2025-12-04T09:04:23.5920086Z 7c40a3faff76: Verifying Checksum 2025-12-04T09:04:23.5920458Z 7c40a3faff76: Download complete 2025-12-04T09:04:24.2653767Z 53c88f1dfeb7: Pull complete 2025-12-04T09:04:24.7886603Z eae668646f44: Pull complete 2025-12-04T09:04:26.5639773Z ff2e6e687b6c: Pull complete 2025-12-04T09:04:31.2934208Z 7c40a3faff76: Pull complete 2025-12-04T09:04:31.5408518Z 967a3b1c8fef: Pull complete 2025-12-04T09:04:32.2780175Z a64e1a44f22a: Pull complete 2025-12-04T09:04:32.4474606Z 52655f8a5bcc: Pull complete 2025-12-04T09:04:32.5077255Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T09:04:32.5308537Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13 2025-12-04T09:04:40.2205463Z Thu Dec 4 09:04:40 2025 2025-12-04T09:04:40.2205908Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:40.2206414Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:04:40.2206894Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:40.2207376Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:04:40.2208235Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:04:40.2208645Z | | | MIG M. | 2025-12-04T09:04:40.2208982Z |=========================================+========================+======================| 2025-12-04T09:04:40.2313559Z | 0 NVIDIA L4 On | 00000000:35:00.0 Off | 0 | 2025-12-04T09:04:40.2314049Z | N/A 34C P8 12W / 72W | 0MiB / 23034MiB | 0% Default | 2025-12-04T09:04:40.2314415Z | | | N/A | 2025-12-04T09:04:40.2314785Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:40.2316249Z 2025-12-04T09:04:40.2316510Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:40.2316971Z | Processes: | 2025-12-04T09:04:40.2317709Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:04:40.2318340Z | ID ID Usage | 2025-12-04T09:04:40.2318663Z |=========================================================================================| 2025-12-04T09:04:40.2321470Z | No running processes found | 2025-12-04T09:04:40.2321984Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:41.5998512Z Command completed after 1 attempt(s). 2025-12-04T09:04:41.6090319Z Prepare all required actions 2025-12-04T09:04:41.6114632Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T09:04:41.6114887Z with: 2025-12-04T09:04:41.6115429Z github-token: *** 2025-12-04T09:04:41.6115609Z env: 2025-12-04T09:04:41.6115768Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:04:41.6116016Z HAS_NVIDIA_GPU: true 2025-12-04T09:04:41.6116410Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:04:41.6116735Z ##[endgroup] 2025-12-04T09:04:41.6130591Z ##[group]Run set -eux 2025-12-04T09:04:41.6130810Z set -eux 2025-12-04T09:04:41.6131147Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T09:04:41.6143784Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:04:41.6144085Z env: 2025-12-04T09:04:41.6144268Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:04:41.6144471Z HAS_NVIDIA_GPU: true 2025-12-04T09:04:41.6144752Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:04:41.6145157Z GITHUB_TOKEN: *** 2025-12-04T09:04:41.6145338Z ##[endgroup] 2025-12-04T09:04:41.6175584Z + python3 .github/scripts/get_workflow_job_id.py 19922768520 i-0e5520d20214059b0 2025-12-04T09:04:42.8178737Z Setting output job-id=57116084862 2025-12-04T09:04:42.8179817Z Setting output job-name=linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:04:42.8284545Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:04:42.8285138Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:04:42.8285826Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-12-04T09:04:42.8286417Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:04:42.8294150Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:04:42.8294431Z env: 2025-12-04T09:04:42.8294589Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:04:42.8294777Z HAS_NVIDIA_GPU: true 2025-12-04T09:04:42.8295015Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:04:42.8295281Z JOB_ID: 57116084862 2025-12-04T09:04:42.8295749Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:04:42.8296215Z WORKFLOW_NAME: trunk 2025-12-04T09:04:42.8296391Z WORKFLOW_RUN_ID: 19922768520 2025-12-04T09:04:42.8296607Z MONITOR_LOG_INTERVAL: 5 2025-12-04T09:04:42.8296805Z MONITOR_DATA_COLLECT_INTERVAL: 1 2025-12-04T09:04:42.8297007Z ##[endgroup] 2025-12-04T09:04:43.0923583Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:04:43.4218282Z Collecting psutil==5.9.8 2025-12-04T09:04:43.4367531Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-12-04T09:04:43.5044719Z Collecting dataclasses_json==0.6.7 2025-12-04T09:04:43.5077763Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-12-04T09:04:43.5328232Z Collecting nvidia-ml-py==11.525.84 2025-12-04T09:04:43.5362726Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-12-04T09:04:43.5664771Z Collecting typing-inspect<1,>=0.4.0 2025-12-04T09:04:43.5691935Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-12-04T09:04:43.6637987Z Collecting marshmallow<4.0.0,>=3.18.0 2025-12-04T09:04:43.6663897Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-12-04T09:04:43.7166555Z Collecting packaging>=17.0 2025-12-04T09:04:43.7198598Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-12-04T09:04:43.7416700Z Collecting mypy-extensions>=0.3.0 2025-12-04T09:04:43.7450348Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-12-04T09:04:43.7907657Z Collecting typing-extensions>=3.7.4 2025-12-04T09:04:43.7936738Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-12-04T09:04:43.8793251Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json 2025-12-04T09:04:44.1265615Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-12-04T09:04:44.2774664Z Prepare all required actions 2025-12-04T09:04:44.2775011Z Getting action download info 2025-12-04T09:04:44.4468168Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:04:44.6887547Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T09:04:45.1401996Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T09:04:45.1402290Z with: 2025-12-04T09:04:45.1402492Z name: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:04:45.1402741Z s3-bucket: gha-artifacts 2025-12-04T09:04:45.1402955Z env: 2025-12-04T09:04:45.1403115Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:04:45.1403331Z HAS_NVIDIA_GPU: true 2025-12-04T09:04:45.1403576Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:04:45.1403858Z ##[endgroup] 2025-12-04T09:04:45.1430312Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:04:45.1430599Z with: 2025-12-04T09:04:45.1430832Z name: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:04:45.1431090Z s3-bucket: gha-artifacts 2025-12-04T09:04:45.1431307Z region: us-east-1 2025-12-04T09:04:45.1431483Z env: 2025-12-04T09:04:45.1431644Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:04:45.1431851Z HAS_NVIDIA_GPU: true 2025-12-04T09:04:45.1432106Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:04:45.1432379Z ##[endgroup] 2025-12-04T09:04:45.5936596Z (node:58808) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:04:45.5937051Z 2025-12-04T09:04:45.5937234Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:04:45.5937749Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:04:45.5938286Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:04:45.8667600Z Found 1 objects with prefix pytorch/pytorch/19922768520/linux-jammy-cuda12.8-py3.10-gcc11/ 2025-12-04T09:04:45.8668338Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:04:54.1384307Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:04:54.1389486Z Artifact download has finished successfully 2025-12-04T09:04:54.1661365Z ##[group]Run unzip -o artifacts.zip 2025-12-04T09:04:54.1661631Z unzip -o artifacts.zip 2025-12-04T09:04:54.1669279Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:04:54.1669590Z env: 2025-12-04T09:04:54.1669764Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:04:54.1669972Z HAS_NVIDIA_GPU: true 2025-12-04T09:04:54.1670220Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:04:54.1670491Z ##[endgroup] 2025-12-04T09:04:54.1751871Z Archive: artifacts.zip 2025-12-04T09:04:54.1753017Z creating: dist/ 2025-12-04T09:04:54.1869622Z inflating: dist/.ninja_log 2025-12-04T09:04:56.1721612Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:04:56.1722080Z creating: build/ 2025-12-04T09:04:56.1722338Z creating: build/custom_test_artifacts/ 2025-12-04T09:04:56.1722734Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T09:04:56.1723169Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T09:04:56.1723689Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:04:56.1730385Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:04:56.1730874Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T09:04:56.1731734Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:04:56.1732439Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:04:56.1733425Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:04:56.1734801Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:04:56.1736493Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:04:56.1737296Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:04:56.1737827Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:04:56.1738323Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:04:56.1740664Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:04:56.1742275Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:04:56.1743238Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:04:56.1745013Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:04:56.1746993Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:04:56.1747592Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:04:56.1748096Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:04:56.1795350Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:04:56.1843985Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:04:56.1844930Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:04:56.1895881Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:04:56.1896852Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:04:56.1897851Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:04:56.1898779Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:04:56.1899692Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:04:56.1900572Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:04:56.1901449Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:04:56.1902557Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:04:56.1903636Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:04:56.1904448Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:04:56.1905241Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:04:56.1906027Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:04:56.1907027Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:04:56.1908048Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:04:56.1910371Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:04:56.1972889Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:04:56.1973794Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:04:56.2036445Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:04:56.2037215Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:04:56.2037775Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:04:56.2038372Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T09:04:56.2038975Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T09:04:56.2039637Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T09:04:56.2040461Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T09:04:56.2041177Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T09:04:56.2041834Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T09:04:56.2042506Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T09:04:56.2043211Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T09:04:56.2043905Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T09:04:56.2045219Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T09:04:56.2046156Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T09:04:56.2064135Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T09:04:56.2226122Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T09:04:56.2226795Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T09:04:56.2227511Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T09:04:56.2228291Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T09:04:56.2229041Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T09:04:56.2229748Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T09:04:56.2230686Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T09:04:56.2231518Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T09:04:56.2232294Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T09:04:56.2233011Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T09:04:56.2233747Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T09:04:56.2252001Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T09:04:56.2318147Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T09:04:56.2319246Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:04:56.2320110Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:04:56.2320737Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T09:04:56.2321332Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T09:04:56.2322680Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T09:04:56.2323405Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-12-04T09:04:56.2325864Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T09:04:56.2326687Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T09:04:56.2327458Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T09:04:56.2463884Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T09:04:56.2507860Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T09:04:56.2508364Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T09:04:56.2508795Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T09:04:56.2509321Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:04:56.2515740Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:04:56.2516329Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T09:04:56.2516894Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:04:56.2517718Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:04:56.2518314Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:04:56.2520399Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:04:56.2521810Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:04:56.2522714Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:04:56.2523346Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:04:56.2523956Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:04:56.2526082Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:04:56.2527423Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:04:56.2528513Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:04:56.2530350Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:04:56.2532089Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:04:56.2532651Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:04:56.2533154Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:04:56.2580493Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:04:56.2629087Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:04:56.2630059Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:04:56.2681102Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:04:56.2682055Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:04:56.2683129Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:04:56.2684142Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:04:56.2685080Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:04:56.2685971Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:04:56.2686860Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:04:56.2687609Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:04:56.2688531Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:04:56.2689370Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:04:56.2690095Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:04:56.2690948Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:04:56.2691920Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:04:56.2692903Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:04:56.2695792Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:04:56.2758610Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:04:56.2759517Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:04:56.2822190Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:04:56.2822923Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:04:56.2823455Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:04:56.2824013Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T09:04:56.2824599Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T09:04:56.2825272Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T09:04:56.2826238Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T09:04:56.2826965Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T09:04:56.2827520Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T09:04:56.2828069Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T09:04:56.2828762Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T09:04:56.2829586Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T09:04:56.2830312Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T09:04:56.2831405Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T09:04:56.2849574Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T09:04:56.2900871Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T09:04:56.2901769Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:04:56.2902486Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:04:56.2903101Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T09:04:56.2904059Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T09:04:56.2905800Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T09:04:56.2906372Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-12-04T09:04:56.2908664Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T09:04:56.2909619Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T09:04:56.2910434Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T09:04:56.2941608Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T09:04:56.2942090Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T09:04:56.2942564Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T09:04:56.2943136Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:04:56.2949577Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:04:56.2950240Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T09:04:56.2950883Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:04:56.2951561Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:04:56.2952212Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:04:56.2953662Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:04:56.2955236Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:04:56.2956115Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:04:56.2956770Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:04:56.2957303Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:04:56.2959450Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:04:56.2961194Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:04:56.2962019Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:04:56.2963879Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:04:56.2965696Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:04:56.2966335Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:04:56.2966876Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:04:56.3014309Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:04:56.3062591Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:04:56.3063611Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:04:56.3114244Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:04:56.3115204Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:04:56.3116181Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:04:56.3117353Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:04:56.3118320Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:04:56.3119260Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:04:56.3120290Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:04:56.3121233Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:04:56.3122309Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:04:56.3123172Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:04:56.3124008Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:04:56.3124834Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:04:56.3125702Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:04:56.3126602Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:04:56.3128935Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:04:56.3191117Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:04:56.3191919Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:04:56.3254874Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:04:56.3255816Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:04:56.3256392Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:04:56.3257004Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T09:04:56.3257637Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T09:04:56.3258344Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T09:04:56.3259147Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T09:04:56.3259928Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T09:04:56.3260783Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T09:04:56.3261534Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T09:04:56.3262287Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T09:04:56.3263038Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T09:04:56.3263785Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T09:04:56.3264672Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T09:04:56.3268527Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T09:04:56.3365614Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T09:04:56.3366411Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T09:04:56.3367196Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T09:04:56.3368069Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T09:04:56.3368896Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T09:04:56.3369659Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T09:04:56.3370432Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T09:04:56.3371225Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T09:04:56.3372046Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T09:04:56.3372832Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T09:04:56.3373603Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T09:04:56.3391177Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T09:04:56.3436308Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T09:04:56.3437158Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:04:56.3437904Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:04:56.3438580Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T09:04:56.3439536Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T09:04:56.3441142Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T09:04:56.3441756Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-12-04T09:04:56.3444354Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T09:04:56.3445353Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T09:04:56.3448303Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T09:04:56.3529291Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T09:04:56.3560541Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T09:04:56.3560975Z creating: build/lib/ 2025-12-04T09:04:56.3628172Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T09:04:56.3984951Z inflating: build/lib/libprotobuf.a 2025-12-04T09:04:56.4385449Z inflating: build/lib/libprotoc.a 2025-12-04T09:04:56.4393486Z inflating: build/lib/libpthreadpool.a 2025-12-04T09:04:56.4400512Z inflating: build/lib/libcpuinfo.a 2025-12-04T09:04:56.4407038Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T09:04:56.4407915Z inflating: build/lib/libclog.a 2025-12-04T09:04:56.4423688Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T09:04:56.4425841Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T09:04:56.4440663Z inflating: build/lib/libnnpack.a 2025-12-04T09:04:56.4588354Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T09:04:56.5271613Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T09:04:56.5328038Z inflating: build/lib/libgtest.a 2025-12-04T09:04:56.5342526Z inflating: build/lib/libgmock.a 2025-12-04T09:04:56.5343293Z inflating: build/lib/libgtest_main.a 2025-12-04T09:04:56.5344163Z inflating: build/lib/libgmock_main.a 2025-12-04T09:04:56.5416580Z inflating: build/lib/libXNNPACK.a 2025-12-04T09:04:56.5477627Z inflating: build/lib/libbenchmark.a 2025-12-04T09:04:56.5478346Z inflating: build/lib/libbenchmark_main.a 2025-12-04T09:04:56.5479349Z inflating: build/lib/libjitprofiling.a 2025-12-04T09:04:56.5486118Z inflating: build/lib/libittnotify.a 2025-12-04T09:04:56.5539011Z inflating: build/lib/libasmjit.a 2025-12-04T09:04:56.6473495Z inflating: build/lib/libfbgemm.a 2025-12-04T09:04:56.6497965Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T09:04:56.6947326Z inflating: build/lib/libtensorpipe.a 2025-12-04T09:04:56.7156355Z inflating: build/lib/libtensorpipe_cuda.a 2025-12-04T09:04:56.7270554Z inflating: build/lib/libgloo.a 2025-12-04T09:04:56.7312248Z inflating: build/lib/libonnx_proto.a 2025-12-04T09:04:56.7680020Z inflating: build/lib/libgloo_cuda.a 2025-12-04T09:04:56.8264790Z inflating: build/lib/libonnx.a 2025-12-04T09:04:56.8280793Z inflating: build/lib/libfmt.a 2025-12-04T09:04:57.6493155Z inflating: build/lib/libdnnl.a 2025-12-04T09:04:57.6873181Z inflating: build/lib/libkineto.a 2025-12-04T09:04:57.6966029Z inflating: build/lib/libc10.so 2025-12-04T09:04:57.7005924Z inflating: build/lib/libc10_cuda.so 2025-12-04T09:04:57.7007954Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T09:04:57.7009563Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T09:05:00.1801117Z inflating: build/lib/libtorch_cpu.so 2025-12-04T09:05:00.2432060Z inflating: build/lib/libtorch_nvshmem.so 2025-12-04T09:05:02.6485225Z inflating: build/lib/libtorch_cuda.so 2025-12-04T09:05:02.6486587Z inflating: build/lib/libtorch.so 2025-12-04T09:05:02.6528305Z inflating: build/lib/libtorch_cuda_linalg.so 2025-12-04T09:05:02.6585943Z inflating: build/lib/libtorchbind_test.so 2025-12-04T09:05:02.6601874Z inflating: build/lib/libjitbackend_test.so 2025-12-04T09:05:02.6621761Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T09:05:02.6643668Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T09:05:02.6646092Z inflating: build/lib/libc10d_cuda_test.so 2025-12-04T09:05:02.6650003Z inflating: build/lib/libshm.so 2025-12-04T09:05:02.8546288Z inflating: build/lib/libtorch_python.so 2025-12-04T09:05:02.8575413Z inflating: build/lib/libnnapi_backend.so 2025-12-04T09:05:02.8575732Z creating: build/bin/ 2025-12-04T09:05:02.8939643Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T09:05:02.9303835Z inflating: build/bin/protoc 2025-12-04T09:05:02.9351312Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T09:05:02.9395516Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T09:05:02.9441558Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T09:05:02.9487390Z inflating: build/bin/c10_Device_test 2025-12-04T09:05:02.9540878Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T09:05:02.9588841Z inflating: build/bin/c10_Scalar_test 2025-12-04T09:05:02.9632981Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T09:05:02.9683825Z inflating: build/bin/c10_SymInt_test 2025-12-04T09:05:02.9732694Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T09:05:02.9782393Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T09:05:02.9826958Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T09:05:02.9876461Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T09:05:02.9938306Z inflating: build/bin/c10_cow_test 2025-12-04T09:05:02.9985419Z inflating: build/bin/c10_Bitset_test 2025-12-04T09:05:03.0030261Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T09:05:03.0074341Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T09:05:03.0121340Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T09:05:03.0170978Z inflating: build/bin/c10_LeftRight_test 2025-12-04T09:05:03.0216720Z inflating: build/bin/c10_Half_test 2025-12-04T09:05:03.0260840Z inflating: build/bin/c10_Semaphore_test 2025-12-04T09:05:03.0311261Z inflating: build/bin/c10_Enumerate_test 2025-12-04T09:05:03.0358916Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T09:05:03.0403616Z inflating: build/bin/c10_Synchronized_test 2025-12-04T09:05:03.0453586Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T09:05:03.0499538Z inflating: build/bin/c10_accumulate_test 2025-12-04T09:05:03.0545945Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T09:05:03.0590889Z inflating: build/bin/c10_bit_cast_test 2025-12-04T09:05:03.0642187Z inflating: build/bin/c10_bfloat16_test 2025-12-04T09:05:03.0692875Z inflating: build/bin/c10_complex_math_test 2025-12-04T09:05:03.0739794Z inflating: build/bin/c10_exception_test 2025-12-04T09:05:03.0784000Z inflating: build/bin/c10_error_test 2025-12-04T09:05:03.0833158Z inflating: build/bin/c10_complex_test 2025-12-04T09:05:03.0878016Z inflating: build/bin/c10_flags_test 2025-12-04T09:05:03.0923730Z inflating: build/bin/c10_generic_math_test 2025-12-04T09:05:03.1056705Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T09:05:03.1102478Z inflating: build/bin/c10_irange_test 2025-12-04T09:05:03.1150151Z inflating: build/bin/c10_lazy_test 2025-12-04T09:05:03.1194646Z inflating: build/bin/c10_nofatal_test 2025-12-04T09:05:03.1245247Z inflating: build/bin/c10_logging_test 2025-12-04T09:05:03.1310214Z inflating: build/bin/c10_optional_test 2025-12-04T09:05:03.1365046Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T09:05:03.1494597Z inflating: build/bin/c10_small_vector_test 2025-12-04T09:05:03.1542020Z inflating: build/bin/c10_registry_test 2025-12-04T09:05:03.1592008Z inflating: build/bin/c10_string_util_test 2025-12-04T09:05:03.1638253Z inflating: build/bin/c10_ssize_test 2025-12-04T09:05:03.1682309Z inflating: build/bin/c10_string_view_test 2025-12-04T09:05:03.1721560Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T09:05:03.1765784Z inflating: build/bin/c10_tempfile_test 2025-12-04T09:05:03.1815172Z inflating: build/bin/c10_typeid_test 2025-12-04T09:05:03.1862286Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-12-04T09:05:03.1909587Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-12-04T09:05:03.1956811Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T09:05:03.2000881Z inflating: build/bin/c10_cuda_CUDATest 2025-12-04T09:05:03.2048101Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-12-04T09:05:03.2094611Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-12-04T09:05:03.2141983Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T09:05:03.2188759Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-12-04T09:05:03.2671382Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T09:05:03.3166525Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T09:05:03.3671635Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T09:05:03.3715600Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T09:05:03.3799571Z inflating: build/bin/test_aoti_abi_check 2025-12-04T09:05:03.3844298Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T09:05:03.3888618Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T09:05:03.3952180Z inflating: build/bin/Dict_test 2025-12-04T09:05:03.3999068Z inflating: build/bin/Dimname_test 2025-12-04T09:05:03.4056254Z inflating: build/bin/MaybeOwned_test 2025-12-04T09:05:03.4106398Z inflating: build/bin/NamedTensor_test 2025-12-04T09:05:03.4158423Z inflating: build/bin/apply_utils_test 2025-12-04T09:05:03.4210835Z inflating: build/bin/atest 2025-12-04T09:05:03.4266760Z inflating: build/bin/basic 2025-12-04T09:05:03.4314740Z inflating: build/bin/broadcast_test 2025-12-04T09:05:03.4360061Z inflating: build/bin/cpu_allocator_test 2025-12-04T09:05:03.4411119Z inflating: build/bin/cpu_generator_test 2025-12-04T09:05:03.4458035Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T09:05:03.4538445Z inflating: build/bin/cpu_rng_test 2025-12-04T09:05:03.4584762Z inflating: build/bin/dlconvertor_test 2025-12-04T09:05:03.4636050Z inflating: build/bin/extension_backend_test 2025-12-04T09:05:03.4684933Z inflating: build/bin/half_test 2025-12-04T09:05:03.4768462Z inflating: build/bin/ivalue_test 2025-12-04T09:05:03.4812734Z inflating: build/bin/lazy_tensor_test 2025-12-04T09:05:03.4859751Z inflating: build/bin/math_kernel_test 2025-12-04T09:05:03.4906455Z inflating: build/bin/memory_format_test 2025-12-04T09:05:03.4954098Z inflating: build/bin/memory_overlapping_test 2025-12-04T09:05:03.5001401Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T09:05:03.5051112Z inflating: build/bin/native_test 2025-12-04T09:05:03.5096469Z inflating: build/bin/operator_name_test 2025-12-04T09:05:03.5141896Z inflating: build/bin/operators_test 2025-12-04T09:05:03.5188238Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T09:05:03.5247477Z inflating: build/bin/pow_test 2025-12-04T09:05:03.5297377Z inflating: build/bin/quantized_test 2025-12-04T09:05:03.5342317Z inflating: build/bin/reduce_ops_test 2025-12-04T09:05:03.5387744Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T09:05:03.5436968Z inflating: build/bin/scalar_tensor_test 2025-12-04T09:05:03.5487631Z inflating: build/bin/scalar_test 2025-12-04T09:05:03.5533894Z inflating: build/bin/StorageUtils_test 2025-12-04T09:05:03.5580124Z inflating: build/bin/stride_properties_test 2025-12-04T09:05:03.5649690Z inflating: build/bin/tensor_iterator_test 2025-12-04T09:05:03.5697479Z inflating: build/bin/test_parallel 2025-12-04T09:05:03.5742902Z inflating: build/bin/thread_init_test 2025-12-04T09:05:03.5793022Z inflating: build/bin/type_ptr_test 2025-12-04T09:05:03.5845862Z inflating: build/bin/type_test 2025-12-04T09:05:03.5892485Z inflating: build/bin/undefined_tensor_test 2025-12-04T09:05:03.5936539Z inflating: build/bin/verify_api_visibility 2025-12-04T09:05:03.5998337Z inflating: build/bin/legacy_vmap_test 2025-12-04T09:05:03.6044228Z inflating: build/bin/weakref_test 2025-12-04T09:05:03.6090083Z inflating: build/bin/wrapdim_test 2025-12-04T09:05:03.6135723Z inflating: build/bin/xla_tensor_test 2025-12-04T09:05:03.6187980Z inflating: build/bin/IListRef_test 2025-12-04T09:05:03.6278389Z inflating: build/bin/List_test 2025-12-04T09:05:03.6336623Z inflating: build/bin/KernelFunction_test 2025-12-04T09:05:03.6438355Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T09:05:03.6520563Z inflating: build/bin/kernel_function_test 2025-12-04T09:05:03.6628476Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T09:05:03.6715587Z inflating: build/bin/kernel_lambda_test 2025-12-04T09:05:03.6768554Z inflating: build/bin/kernel_stackbased_test 2025-12-04T09:05:03.6850612Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T09:05:03.6896118Z inflating: build/bin/CppSignature_test 2025-12-04T09:05:03.6944695Z inflating: build/bin/backend_fallback_test 2025-12-04T09:05:03.6988200Z inflating: build/bin/op_allowlist_test 2025-12-04T09:05:03.7249350Z inflating: build/bin/op_registration_test 2025-12-04T09:05:03.7307679Z inflating: build/bin/inline_container_test 2025-12-04T09:05:03.7355204Z inflating: build/bin/cuda_allocator_test 2025-12-04T09:05:03.7401851Z inflating: build/bin/cuda_apply_test 2025-12-04T09:05:03.7455225Z inflating: build/bin/cuda_atomic_ops_test 2025-12-04T09:05:03.7505082Z inflating: build/bin/cuda_caching_host_allocator_test 2025-12-04T09:05:03.7566724Z inflating: build/bin/cuda_complex_math_test 2025-12-04T09:05:03.7619265Z inflating: build/bin/cuda_complex_test 2025-12-04T09:05:03.7676194Z inflating: build/bin/cuda_cub_test 2025-12-04T09:05:03.7723539Z inflating: build/bin/cuda_cublas_handle_pool_test 2025-12-04T09:05:03.7767678Z inflating: build/bin/cuda_device_test 2025-12-04T09:05:03.7833561Z inflating: build/bin/cuda_distributions_test 2025-12-04T09:05:03.7880371Z inflating: build/bin/cuda_dlconvertor_test 2025-12-04T09:05:03.7928395Z inflating: build/bin/cuda_event_test 2025-12-04T09:05:03.7971928Z inflating: build/bin/cuda_exchange_device_test 2025-12-04T09:05:03.8022558Z inflating: build/bin/cuda_generator_test 2025-12-04T09:05:03.8067248Z inflating: build/bin/cuda_half_test 2025-12-04T09:05:03.8111777Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-12-04T09:05:03.8165914Z inflating: build/bin/cuda_stream_test 2025-12-04T09:05:03.8212738Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-12-04T09:05:03.8256980Z inflating: build/bin/cuda_cudnn_test 2025-12-04T09:05:03.8302881Z inflating: build/bin/cuda_integer_divider_test 2025-12-04T09:05:03.8347420Z inflating: build/bin/cuda_optional_test 2025-12-04T09:05:03.8393987Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-12-04T09:05:03.8440786Z inflating: build/bin/cuda_vectorized_test 2025-12-04T09:05:03.9340075Z inflating: build/bin/test_jit 2025-12-04T09:05:03.9635018Z inflating: build/bin/test_lazy 2025-12-04T09:05:03.9682340Z inflating: build/bin/BackoffTest 2025-12-04T09:05:03.9729801Z inflating: build/bin/FileStoreTest 2025-12-04T09:05:03.9780051Z inflating: build/bin/TCPStoreTest 2025-12-04T09:05:03.9828665Z inflating: build/bin/HashStoreTest 2025-12-04T09:05:03.9840726Z inflating: build/bin/ProcessGroupMPITest 2025-12-04T09:05:03.9843446Z inflating: build/bin/example_allreduce 2025-12-04T09:05:03.9892252Z inflating: build/bin/test_dist_autograd 2025-12-04T09:05:03.9951998Z inflating: build/bin/test_cpp_rpc 2025-12-04T09:05:04.0011450Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T09:05:04.0061663Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-12-04T09:05:04.0118865Z inflating: build/bin/ProcessGroupNCCLTest 2025-12-04T09:05:04.0173093Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-12-04T09:05:04.1133540Z inflating: build/bin/test_api 2025-12-04T09:05:04.1135930Z inflating: build/bin/parallel_benchmark 2025-12-04T09:05:04.1139402Z inflating: build/bin/torch_shm_manager 2025-12-04T09:05:04.1139875Z creating: .additional_ci_files/ 2025-12-04T09:05:04.1192007Z inflating: .additional_ci_files/test-times.json 2025-12-04T09:05:04.1382288Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T09:05:04.1444359Z ##[group]Run rm artifacts.zip 2025-12-04T09:05:04.1444627Z rm artifacts.zip 2025-12-04T09:05:04.1455033Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:04.1455307Z env: 2025-12-04T09:05:04.1455662Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:04.1455860Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:04.1456092Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:04.1456340Z ##[endgroup] 2025-12-04T09:05:04.3048492Z ##[group]Run df -H 2025-12-04T09:05:04.3048688Z df -H 2025-12-04T09:05:04.3055662Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:04.3055946Z env: 2025-12-04T09:05:04.3056110Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:04.3056315Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:04.3056550Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:04.3056811Z ##[endgroup] 2025-12-04T09:05:04.3103434Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T09:05:04.3103813Z devtmpfs 4.2M 0 4.2M 0% /dev 2025-12-04T09:05:04.3104119Z tmpfs 33G 0 33G 0% /dev/shm 2025-12-04T09:05:04.3104429Z tmpfs 13G 775k 13G 1% /run 2025-12-04T09:05:04.3104725Z /dev/nvme0n1p1 161G 54G 108G 34% / 2025-12-04T09:05:04.3105030Z tmpfs 33G 17k 33G 1% /tmp 2025-12-04T09:05:04.3105335Z /dev/nvme0n1p128 11M 1.4M 9.2M 13% /boot/efi 2025-12-04T09:05:04.3105663Z tmpfs 6.5G 0 6.5G 0% /run/user/0 2025-12-04T09:05:04.3134502Z Prepare all required actions 2025-12-04T09:05:04.3135305Z Getting action download info 2025-12-04T09:05:04.5614668Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T09:05:04.5614932Z with: 2025-12-04T09:05:04.5615090Z env: 2025-12-04T09:05:04.5615241Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:04.5615429Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:04.5615653Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:04.5615898Z ##[endgroup] 2025-12-04T09:05:04.5737645Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:05:04.5737890Z with: 2025-12-04T09:05:04.5738043Z name: td_results 2025-12-04T09:05:04.5738210Z s3-bucket: gha-artifacts 2025-12-04T09:05:04.5738401Z region: us-east-1 2025-12-04T09:05:04.5738573Z env: 2025-12-04T09:05:04.5738720Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:04.5738913Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:04.5739141Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:04.5739533Z ##[endgroup] 2025-12-04T09:05:04.9732633Z (node:58831) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:05:04.9733090Z 2025-12-04T09:05:04.9733266Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:05:04.9733750Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:05:04.9734253Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:05:05.0669300Z Found 1 objects with prefix pytorch/pytorch/19922768520/td_results/ 2025-12-04T09:05:05.0669898Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:05:05.1336919Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:05:05.1341772Z Artifact download has finished successfully 2025-12-04T09:05:05.1598564Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T09:05:05.1599031Z mkdir -p .additional_ci_files 2025-12-04T09:05:05.1599434Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T09:05:05.1608269Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:05.1608557Z env: 2025-12-04T09:05:05.1608746Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:05.1608957Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:05.1609210Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:05.1609474Z ##[endgroup] 2025-12-04T09:05:05.1724102Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T09:05:05.1724433Z .github/scripts/parse_ref.py 2025-12-04T09:05:05.1731411Z shell: /usr/bin/bash -e {0} 2025-12-04T09:05:05.1731610Z env: 2025-12-04T09:05:05.1731763Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:05.1731968Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:05.1732191Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:05.1732447Z ##[endgroup] 2025-12-04T09:05:05.1937922Z Setting output branch=main 2025-12-04T09:05:05.2034866Z Prepare all required actions 2025-12-04T09:05:05.2035193Z Getting action download info 2025-12-04T09:05:05.3588635Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T09:05:05.3588913Z with: 2025-12-04T09:05:05.3589251Z github-token: *** 2025-12-04T09:05:05.3596126Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:05:05.3603691Z job-name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:05.3604144Z env: 2025-12-04T09:05:05.3604312Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:05.3604517Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:05.3604756Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:05.3605035Z ##[endgroup] 2025-12-04T09:05:05.3687211Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:05:05.3687485Z with: 2025-12-04T09:05:05.3687689Z shell: bash 2025-12-04T09:05:05.3687858Z timeout_minutes: 10 2025-12-04T09:05:05.3688046Z max_attempts: 5 2025-12-04T09:05:05.3688224Z retry_wait_seconds: 30 2025-12-04T09:05:05.3689010Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:05:05.3689690Z polling_interval_seconds: 1 2025-12-04T09:05:05.3689912Z warning_on_retry: true 2025-12-04T09:05:05.3690114Z continue_on_error: false 2025-12-04T09:05:05.3690303Z env: 2025-12-04T09:05:05.3690468Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:05.3690671Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:05.3690911Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:05.3691333Z GITHUB_TOKEN: *** 2025-12-04T09:05:05.3691522Z ##[endgroup] 2025-12-04T09:05:05.4607249Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:05:05.6749096Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:05:05.8351713Z Collecting requests==2.27.1 2025-12-04T09:05:05.8501584Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T09:05:06.0374759Z Collecting pyyaml==6.0.2 2025-12-04T09:05:06.0412534Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-12-04T09:05:06.0640988Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10) 2025-12-04T09:05:06.0644442Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10) 2025-12-04T09:05:06.1097810Z Collecting certifi>=2017.4.17 2025-12-04T09:05:06.1134812Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T09:05:06.4713726Z Collecting charset-normalizer~=2.0.0 2025-12-04T09:05:06.4741842Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T09:05:06.5539777Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml 2025-12-04T09:05:06.6685684Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-12-04T09:05:07.4391809Z Command completed after 1 attempt(s). 2025-12-04T09:05:07.4461908Z ##[group]Run set -x 2025-12-04T09:05:07.4462109Z set -x 2025-12-04T09:05:07.4462270Z  2025-12-04T09:05:07.4462549Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:05:07.4462886Z # in runner workspace 2025-12-04T09:05:07.4463165Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T09:05:07.4471683Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:07.4471955Z env: 2025-12-04T09:05:07.4472117Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:07.4472312Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:07.4472532Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:07.4472974Z ##[endgroup] 2025-12-04T09:05:07.4499870Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T09:05:07.4665315Z Setting output branch=main 2025-12-04T09:05:07.4720386Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:05:07.4720722Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:05:07.4720975Z echo "Job name: ${JOB_NAME}" 2025-12-04T09:05:07.4721197Z  2025-12-04T09:05:07.4721478Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:05:07.4721856Z # in runner workspace 2025-12-04T09:05:07.4722185Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T09:05:07.4722542Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T09:05:07.4722805Z  --job-name "${JOB_NAME}" \ 2025-12-04T09:05:07.4729972Z  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" \ 2025-12-04T09:05:07.4737087Z  --selected-test-configs "" \ 2025-12-04T09:05:07.4737349Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T09:05:07.4737719Z  --tag "${TAG}" \ 2025-12-04T09:05:07.4737935Z  --event-name "${EVENT_NAME}" \ 2025-12-04T09:05:07.4738183Z  --schedule "${SCHEDULE}" \ 2025-12-04T09:05:07.4738425Z  --branch "${HEAD_BRANCH}" 2025-12-04T09:05:07.4746082Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:07.4746370Z env: 2025-12-04T09:05:07.4746538Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:07.4746729Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:07.4746978Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:07.4747443Z GITHUB_TOKEN: *** 2025-12-04T09:05:07.4747892Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:07.4748351Z PR_NUMBER: 2025-12-04T09:05:07.4748520Z TAG: 2025-12-04T09:05:07.4748678Z EVENT_NAME: schedule 2025-12-04T09:05:07.4748863Z SCHEDULE: 29 8 * * * 2025-12-04T09:05:07.4749046Z HEAD_BRANCH: main 2025-12-04T09:05:07.4749222Z ##[endgroup] 2025-12-04T09:05:07.4773657Z Workflow: trunk 2025-12-04T09:05:07.4774518Z Job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:07.6643485Z Setting output keep-going=True 2025-12-04T09:05:07.6643987Z Setting output ci-verbose-test-logs=False 2025-12-04T09:05:07.6644506Z Setting output ci-test-showlocals=False 2025-12-04T09:05:07.6644913Z Setting output ci-no-test-timeout=False 2025-12-04T09:05:07.6645190Z Setting output ci-no-td=False 2025-12-04T09:05:07.6645470Z Setting output ci-td-distributed=False 2025-12-04T09:05:07.6645758Z Setting output is-unstable=False 2025-12-04T09:05:07.6646016Z Setting output reenabled-issues= 2025-12-04T09:05:07.6662471Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:05:07.6678079Z Setting output is-test-matrix-empty=False 2025-12-04T09:05:07.6745422Z ##[group]Run echo "Filtered matrix:" 2025-12-04T09:05:07.6745724Z echo "Filtered matrix:" 2025-12-04T09:05:07.6761385Z echo "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" 2025-12-04T09:05:07.6777053Z  2025-12-04T09:05:07.6777218Z echo 2025-12-04T09:05:07.6777429Z echo "Is the current job unstable? False" 2025-12-04T09:05:07.6777678Z  2025-12-04T09:05:07.6777835Z echo 2025-12-04T09:05:07.6778027Z echo "Is keep-going label set? True" 2025-12-04T09:05:07.6778259Z  2025-12-04T09:05:07.6778423Z echo 2025-12-04T09:05:07.6778606Z echo "Reenabled issues? " 2025-12-04T09:05:07.6785943Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:07.6786240Z env: 2025-12-04T09:05:07.6786509Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:07.6786707Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:07.6786945Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:07.6787207Z ##[endgroup] 2025-12-04T09:05:07.6812789Z Filtered matrix: 2025-12-04T09:05:07.6833916Z {include: [{config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}]} 2025-12-04T09:05:07.6849456Z 2025-12-04T09:05:07.6849556Z Is the current job unstable? False 2025-12-04T09:05:07.6849710Z 2025-12-04T09:05:07.6849799Z Is keep-going label set? True 2025-12-04T09:05:07.6849935Z 2025-12-04T09:05:07.6850003Z Reenabled issues? 2025-12-04T09:05:07.6876520Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:07.6876949Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:07.6883991Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:07.6884275Z env: 2025-12-04T09:05:07.6884435Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:07.6884629Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:07.6884861Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:07.6885124Z JOB_TIMEOUT: 600 2025-12-04T09:05:07.6885294Z ##[endgroup] 2025-12-04T09:05:07.6939110Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:07.6939545Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:07.6939883Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:07.6946646Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:07.6946915Z env: 2025-12-04T09:05:07.6947097Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:07.6947291Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:07.6947526Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:07.6947788Z ##[endgroup] 2025-12-04T09:05:07.7042281Z ##[group]Run set -x 2025-12-04T09:05:07.7042556Z set -x 2025-12-04T09:05:07.7042740Z  2025-12-04T09:05:07.7042933Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T09:05:07.7043218Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T09:05:07.7043675Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T09:05:07.7043938Z  TEST_COMMAND=.ci/onnx/test.sh 2025-12-04T09:05:07.7044153Z else 2025-12-04T09:05:07.7044344Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:05:07.7044564Z fi 2025-12-04T09:05:07.7044708Z  2025-12-04T09:05:07.7044887Z # Leaving 1GB for the runner and other things 2025-12-04T09:05:07.7045313Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-12-04T09:05:07.7045960Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-12-04T09:05:07.7046460Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-12-04T09:05:07.7046851Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-12-04T09:05:07.7047148Z  2025-12-04T09:05:07.7047335Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:05:07.7047578Z  SHM_OPTS= 2025-12-04T09:05:07.7047754Z  JENKINS_USER= 2025-12-04T09:05:07.7048001Z  # ensure that docker container cleanly exits in 12 hours 2025-12-04T09:05:07.7048329Z  # if for some reason cleanup action doesn't stop container 2025-12-04T09:05:07.7048606Z  # when job is cancelled 2025-12-04T09:05:07.7048819Z  DOCKER_SHELL_CMD="sleep 12h" 2025-12-04T09:05:07.7049046Z  USED_IMAGE="${DOCKER_IMAGE_S390X}" 2025-12-04T09:05:07.7049261Z else 2025-12-04T09:05:07.7049446Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-12-04T09:05:07.7049693Z  JENKINS_USER="--user jenkins" 2025-12-04T09:05:07.7049912Z  DOCKER_SHELL_CMD= 2025-12-04T09:05:07.7050114Z  USED_IMAGE="${DOCKER_IMAGE}" 2025-12-04T09:05:07.7050315Z fi 2025-12-04T09:05:07.7050453Z  2025-12-04T09:05:07.7050690Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T09:05:07.7051061Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T09:05:07.7051478Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-12-04T09:05:07.7051849Z # shellcheck disable=SC2086,SC2090 2025-12-04T09:05:07.7052081Z container_name=$(docker run \ 2025-12-04T09:05:07.7052301Z  ${GPU_FLAG:-} \ 2025-12-04T09:05:07.7052510Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-12-04T09:05:07.7052754Z  -e BUILD_ENVIRONMENT \ 2025-12-04T09:05:07.7052962Z  -e PR_NUMBER \ 2025-12-04T09:05:07.7053152Z  -e GITHUB_ACTIONS \ 2025-12-04T09:05:07.7053354Z  -e GITHUB_REPOSITORY \ 2025-12-04T09:05:07.7053565Z  -e GITHUB_WORKFLOW \ 2025-12-04T09:05:07.7053761Z  -e GITHUB_JOB \ 2025-12-04T09:05:07.7053951Z  -e GITHUB_RUN_ID \ 2025-12-04T09:05:07.7054153Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T09:05:07.7054360Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T09:05:07.7054570Z  -e JOB_ID \ 2025-12-04T09:05:07.7054758Z  -e JOB_NAME \ 2025-12-04T09:05:07.7054940Z  -e BASE_SHA \ 2025-12-04T09:05:07.7055109Z  -e BRANCH \ 2025-12-04T09:05:07.7055285Z  -e SHA1 \ 2025-12-04T09:05:07.7055463Z  -e AWS_DEFAULT_REGION \ 2025-12-04T09:05:07.7055661Z  -e IN_WHEEL_TEST \ 2025-12-04T09:05:07.7055854Z  -e SHARD_NUMBER \ 2025-12-04T09:05:07.7056050Z  -e TEST_CONFIG \ 2025-12-04T09:05:07.7056239Z  -e NUM_TEST_SHARDS \ 2025-12-04T09:05:07.7056561Z  -e REENABLED_ISSUES \ 2025-12-04T09:05:07.7056781Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T09:05:07.7057011Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T09:05:07.7057208Z  -e TEST_SHOWLOCALS \ 2025-12-04T09:05:07.7057474Z  -e NO_TEST_TIMEOUT \ 2025-12-04T09:05:07.7057663Z  -e NO_TD \ 2025-12-04T09:05:07.7057837Z  -e TD_DISTRIBUTED \ 2025-12-04T09:05:07.7058034Z  -e PR_LABELS \ 2025-12-04T09:05:07.7058240Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T09:05:07.7058467Z  -e SCCACHE_BUCKET \ 2025-12-04T09:05:07.7058659Z  -e SCCACHE_REGION \ 2025-12-04T09:05:07.7058845Z  -e XLA_CUDA \ 2025-12-04T09:05:07.7059042Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-12-04T09:05:07.7059292Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T09:05:07.7059551Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T09:05:07.7059822Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-12-04T09:05:07.7060055Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T09:05:07.7060284Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-12-04T09:05:07.7060521Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-12-04T09:05:07.7060741Z  -e DASHBOARD_TAG \ 2025-12-04T09:05:07.7060941Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-12-04T09:05:07.7061192Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-12-04T09:05:07.7061483Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-12-04T09:05:07.7061767Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:05:07.7062039Z  --security-opt seccomp=unconfined \ 2025-12-04T09:05:07.7062281Z  --cap-add=SYS_PTRACE \ 2025-12-04T09:05:07.7062479Z  --ipc=host \ 2025-12-04T09:05:07.7062660Z  ${SHM_OPTS} \ 2025-12-04T09:05:07.7062835Z  --tty \ 2025-12-04T09:05:07.7062997Z  --detach \ 2025-12-04T09:05:07.7063186Z  --name="${container_name}" \ 2025-12-04T09:05:07.7063410Z  ${JENKINS_USER} \ 2025-12-04T09:05:07.7063651Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T09:05:07.7063929Z  -w /var/lib/jenkins/workspace \ 2025-12-04T09:05:07.7064154Z  "${USED_IMAGE}" \ 2025-12-04T09:05:07.7064345Z  ${DOCKER_SHELL_CMD} 2025-12-04T09:05:07.7064521Z ) 2025-12-04T09:05:07.7064755Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-12-04T09:05:07.7065039Z  2025-12-04T09:05:07.7065224Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:05:07.7065648Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-12-04T09:05:07.7066011Z fi 2025-12-04T09:05:07.7066171Z  2025-12-04T09:05:07.7066512Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-12-04T09:05:07.7073360Z shell: /usr/bin/bash -e {0} 2025-12-04T09:05:07.7073566Z env: 2025-12-04T09:05:07.7073715Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:07.7073913Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:07.7074152Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:07.7074474Z BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:07.7074725Z PR_NUMBER: 2025-12-04T09:05:07.7074902Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T09:05:07.7075122Z GITHUB_WORKFLOW: trunk 2025-12-04T09:05:07.7075295Z GITHUB_JOB: test 2025-12-04T09:05:07.7075478Z GITHUB_RUN_ID: 19922768520 2025-12-04T09:05:07.7075677Z GITHUB_RUN_NUMBER: 158165 2025-12-04T09:05:07.7075863Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T09:05:07.7076037Z JOB_ID: 57116084862 2025-12-04T09:05:07.7076453Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:07.7077007Z BRANCH: main 2025-12-04T09:05:07.7077203Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:07.7077473Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:07.7077717Z TEST_CONFIG: default 2025-12-04T09:05:07.7077881Z SHARD_NUMBER: 2 2025-12-04T09:05:07.7078115Z NUM_TEST_SHARDS: 5 2025-12-04T09:05:07.7078279Z EXTRA_FLAGS: 2025-12-04T09:05:07.7078445Z OP_BENCHMARK_TESTS: 2025-12-04T09:05:07.7078619Z REENABLED_ISSUES: 2025-12-04T09:05:07.7078794Z CONTINUE_THROUGH_ERROR: True 2025-12-04T09:05:07.7078988Z VERBOSE_TEST_LOGS: False 2025-12-04T09:05:07.7079173Z TEST_SHOWLOCALS: False 2025-12-04T09:05:07.7079351Z NO_TEST_TIMEOUT: False 2025-12-04T09:05:07.7079517Z NO_TD: False 2025-12-04T09:05:07.7079670Z TD_DISTRIBUTED: False 2025-12-04T09:05:07.7079984Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-12-04T09:05:07.7080245Z SCCACHE_REGION: us-east-1 2025-12-04T09:05:07.7080418Z SHM_SIZE: 2g 2025-12-04T09:05:07.7080972Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:07.7081957Z DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:07.7082559Z XLA_CUDA: 2025-12-04T09:05:07.7082803Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:05:07.7083137Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T09:05:07.7083374Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T09:05:07.7083579Z DASHBOARD_TAG: 2025-12-04T09:05:07.7083915Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-12-04T09:05:07.7084224Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T09:05:07.7084516Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-12-04T09:05:07.7084909Z ARTIFACTS_FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T09:05:07.7085292Z ##[endgroup] 2025-12-04T09:05:07.7114020Z + [[ default == \m\u\l\t\i\g\p\u ]] 2025-12-04T09:05:07.7114339Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *onnx* ]] 2025-12-04T09:05:07.7114615Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:05:07.7117490Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-12-04T09:05:07.7138672Z + TOTAL_AVAILABLE_MEMORY_IN_GB='59.453 ' 2025-12-04T09:05:07.7139193Z + TOTAL_MEMORY_WITH_SWAP=62 2025-12-04T09:05:07.7139530Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:05:07.7139860Z + SHM_OPTS=--shm-size=2g 2025-12-04T09:05:07.7140109Z + JENKINS_USER='--user jenkins' 2025-12-04T09:05:07.7140349Z + DOCKER_SHELL_CMD= 2025-12-04T09:05:07.7141057Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:07.7147533Z +++ nproc --ignore=2 2025-12-04T09:05:07.7180560Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=59g --memory-swap=62g --env-file=/tmp/github_env_19922768520 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:16.8681044Z + container_name=e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T09:05:16.8682168Z + echo DOCKER_CONTAINER_ID=e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T09:05:16.8682723Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:05:16.8685620Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:05:16.8688207Z + docker exec -t e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-12-04T09:05:17.2744254Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f) 2025-12-04T09:05:17.6297801Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T09:05:17.6300838Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T09:05:17.6304807Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T09:05:17.6309072Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T09:05:17.6312347Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T09:05:17.6316627Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T09:05:17.6329691Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0) 2025-12-04T09:05:17.6660890Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4) 2025-12-04T09:05:17.6678385Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T09:05:17.6731006Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T09:05:18.0302670Z Installing collected packages: torch 2025-12-04T09:05:28.8959827Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T09:05:28.9471841Z + export TERM=vt100 2025-12-04T09:05:28.9472168Z + TERM=vt100 2025-12-04T09:05:28.9473917Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:05:28.9483441Z + source .ci/pytorch/common.sh 2025-12-04T09:05:28.9487460Z +++ dirname .ci/pytorch/common.sh 2025-12-04T09:05:28.9496161Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T09:05:28.9497422Z +++ declare -f -t trap_add 2025-12-04T09:05:28.9502595Z ++ set -ex -o pipefail 2025-12-04T09:05:28.9502973Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:05:28.9503279Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T09:05:28.9507002Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:05:28.9513990Z + source .ci/pytorch/common-build.sh 2025-12-04T09:05:28.9515691Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 != *win-* ]] 2025-12-04T09:05:28.9521570Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T09:05:28.9529719Z +++ cd .ci/pytorch 2025-12-04T09:05:28.9529967Z +++ pwd -P 2025-12-04T09:05:28.9531933Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-12-04T09:05:28.9532343Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-pch* ]] 2025-12-04T09:05:28.9532646Z ++ which sccache 2025-12-04T09:05:28.9545439Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-12-04T09:05:28.9547455Z ++ sccache --stop-server 2025-12-04T09:05:28.9574665Z ++ true 2025-12-04T09:05:28.9574908Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T09:05:28.9586122Z ++ trap_add sccache_epilogue EXIT 2025-12-04T09:05:28.9586411Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T09:05:28.9586656Z ++ shift 2025-12-04T09:05:28.9586853Z ++ for trap_add_name in "$@" 2025-12-04T09:05:28.9592884Z ++++ trap -p EXIT 2025-12-04T09:05:28.9595998Z +++ eval 'extract_trap_cmd ' 2025-12-04T09:05:28.9596247Z ++++ extract_trap_cmd 2025-12-04T09:05:28.9596634Z ++++ printf '%s\n' '' 2025-12-04T09:05:28.9596893Z +++ printf '%s\n' sccache_epilogue 2025-12-04T09:05:28.9599099Z ++ trap -- ' 2025-12-04T09:05:28.9599307Z sccache_epilogue' EXIT 2025-12-04T09:05:28.9599519Z ++ [[ -n 1 ]] 2025-12-04T09:05:28.9599869Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-12-04T09:05:28.9600498Z Skipping sccache server initialization, setting environment variables 2025-12-04T09:05:28.9600899Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:05:28.9601154Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:05:28.9601448Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:05:28.9601829Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:05:28.9607982Z ++ export RUST_LOG=sccache::server=error 2025-12-04T09:05:28.9608279Z ++ RUST_LOG=sccache::server=error 2025-12-04T09:05:28.9608498Z ++ sccache --zero-stats 2025-12-04T09:05:29.1293726Z Statistics zeroed. 2025-12-04T09:05:29.1298714Z ++ which ccache 2025-12-04T09:05:29.1311775Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *rocm* ]] 2025-12-04T09:05:29.1312265Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *s390x* ]] 2025-12-04T09:05:29.1312645Z + [[ -d /var/lib/jenkins/workspace ]] 2025-12-04T09:05:29.1314811Z ++ stat -c %u /var/lib/jenkins/workspace 2025-12-04T09:05:29.1332016Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-12-04T09:05:29.1342796Z + trap_add cleanup_workspace EXIT 2025-12-04T09:05:29.1343156Z + trap_add_cmd=cleanup_workspace 2025-12-04T09:05:29.1343378Z + shift 2025-12-04T09:05:29.1343553Z + for trap_add_name in "$@" 2025-12-04T09:05:29.1343766Z +++ trap -p EXIT 2025-12-04T09:05:29.1343951Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-12-04T09:05:29.1344179Z sccache_epilogue'\'' EXIT' 2025-12-04T09:05:29.1344379Z +++ extract_trap_cmd trap -- ' 2025-12-04T09:05:29.1344573Z sccache_epilogue' EXIT 2025-12-04T09:05:29.1344747Z +++ printf '%s\n' ' 2025-12-04T09:05:29.1344914Z sccache_epilogue' 2025-12-04T09:05:29.1345091Z ++ printf '%s\n' cleanup_workspace 2025-12-04T09:05:29.1345400Z + trap -- ' 2025-12-04T09:05:29.1345642Z sccache_epilogue 2025-12-04T09:05:29.1345839Z cleanup_workspace' EXIT 2025-12-04T09:05:29.1346072Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-12-04T09:05:30.0563731Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-12-04T09:05:30.0581669Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:05:30.0584769Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))' 2025-12-04T09:05:30.4261949Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:05:30.4262593Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']' 2025-12-04T09:05:30.4268096Z +++ realpath .ci/pytorch/test.sh 2025-12-04T09:05:30.4278546Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh 2025-12-04T09:05:30.4286513Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch 2025-12-04T09:05:30.4286985Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:05:30.4287763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace 2025-12-04T09:05:30.4288116Z + patch -p4 2025-12-04T09:05:30.4302491Z patching file cudadrv/driver.py 2025-12-04T09:05:30.4302797Z Hunk #1 succeeded at 357 (offset -8 lines). 2025-12-04T09:05:30.4358086Z + popd 2025-12-04T09:05:30.4359140Z ~/workspace 2025-12-04T09:05:30.4359741Z + echo 'Environment variables:' 2025-12-04T09:05:30.4360656Z Environment variables: 2025-12-04T09:05:30.4361242Z + env 2025-12-04T09:05:30.4375321Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:05:30.4375906Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:05:30.4376457Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:30.4377197Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:05:30.4377593Z HOSTNAME=e29498c26bf7 2025-12-04T09:05:30.4378337Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4379332Z GITHUB_ACTION=__run_3 2025-12-04T09:05:30.4379757Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:05:30.4380238Z GITHUB_RUN_NUMBER=158165 2025-12-04T09:05:30.4380615Z TEST_CONFIG=default 2025-12-04T09:05:30.4380958Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:05:30.4381250Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:05:30.4381547Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:05:30.4381939Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:05:30.4382214Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:05:30.4382455Z GITHUB_REF_TYPE=branch 2025-12-04T09:05:30.4382733Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4383036Z XLA_CUDA= 2025-12-04T09:05:30.4383340Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:05:30.4383810Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:05:30.4384368Z *** 2025-12-04T09:05:30.4384565Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:05:30.4384832Z GITHUB_ACTIONS=true 2025-12-04T09:05:30.4385051Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:30.4385349Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:05:30.4385712Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4386045Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4386449Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main 2025-12-04T09:05:30.4386794Z UCC_HOME=/usr 2025-12-04T09:05:30.4386949Z VERBOSE_TEST_LOGS=False 2025-12-04T09:05:30.4387134Z GITHUB_REF=refs/heads/main 2025-12-04T09:05:30.4387313Z SHARD_NUMBER=2 2025-12-04T09:05:30.4387470Z GITHUB_REF_PROTECTED=true 2025-12-04T09:05:30.4387663Z HOME=/var/lib/jenkins 2025-12-04T09:05:30.4387869Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:05:30.4388107Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:05:30.4388350Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:05:30.4388587Z USE_SYSTEM_NCCL=1 2025-12-04T09:05:30.4388740Z NUM_TEST_SHARDS=5 2025-12-04T09:05:30.4388899Z UCX_HOME=/usr 2025-12-04T09:05:30.4389318Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4390020Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:30.4390687Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4391258Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:05:30.4391615Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:05:30.4391801Z DASHBOARD_TAG= 2025-12-04T09:05:30.4391964Z GITHUB_RUN_ID=19922768520 2025-12-04T09:05:30.4392144Z INSTALLED_OPENBLAS= 2025-12-04T09:05:30.4392561Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4393029Z GITHUB_ACTOR=huydhn 2025-12-04T09:05:30.4393183Z PR_NUMBER= 2025-12-04T09:05:30.4393329Z DESIRED_CUDA=12.8.1 2025-12-04T09:05:30.4393489Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:05:30.4393878Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:05:30.4394126Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:05:30.4394365Z TERM=vt100 2025-12-04T09:05:30.4394513Z INSTALLED_VISION=yes 2025-12-04T09:05:30.4394677Z BRANCH=main 2025-12-04T09:05:30.4394933Z SCCACHE_REGION=us-east-1 2025-12-04T09:05:30.4395125Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:05:30.4395331Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:05:30.4395509Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:05:30.4395889Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:05:30.4396297Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:05:30.4396539Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:05:30.4396776Z REENABLED_ISSUES= 2025-12-04T09:05:30.4396927Z DOCS= 2025-12-04T09:05:30.4397074Z SHLVL=1 2025-12-04T09:05:30.4397208Z MAX_JOBS=14 2025-12-04T09:05:30.4397359Z GITHUB_ACTOR_ID=475357 2025-12-04T09:05:30.4397601Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4397866Z GITHUB_REF_NAME=main 2025-12-04T09:05:30.4398131Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:05:30.4398437Z GITHUB_JOB=test 2025-12-04T09:05:30.4398595Z NO_TEST_TIMEOUT=False 2025-12-04T09:05:30.4398770Z TD_DISTRIBUTED=False 2025-12-04T09:05:30.4398953Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:05:30.4399155Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:05:30.4399337Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:05:30.4399525Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:05:30.4400187Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:05:30.4400751Z GITHUB_BASE_REF= 2025-12-04T09:05:30.4400915Z INSTALLED_ACL= 2025-12-04T09:05:30.4401251Z ARTIFACTS_FILE_SUFFIX=test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T09:05:30.4401624Z CI=true 2025-12-04T09:05:30.4401787Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:05:30.4402023Z RUST_LOG=sccache::server=error 2025-12-04T09:05:30.4402204Z JOB_ID=57116084862 2025-12-04T09:05:30.4402361Z GITHUB_HEAD_REF= 2025-12-04T09:05:30.4402523Z GITHUB_ACTION_REF= 2025-12-04T09:05:30.4402717Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:05:30.4402968Z TEST_SHOWLOCALS=False 2025-12-04T09:05:30.4403140Z GITHUB_WORKFLOW=trunk 2025-12-04T09:05:30.4403309Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:05:30.4403746Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4404186Z NO_TD=False 2025-12-04T09:05:30.4404354Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:05:30.4404563Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:05:30.4404780Z _=/usr/bin/env 2025-12-04T09:05:30.4405028Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:05:30.4405398Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T09:05:30.4504803Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-12-04T09:05:30.4505803Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-12-04T09:05:30.4506616Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-12-04T09:05:30.4507251Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-12-04T09:05:30.4507726Z + BUILD_DIR=build 2025-12-04T09:05:30.4508034Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T09:05:30.4508380Z + BUILD_BIN_DIR=build/bin 2025-12-04T09:05:30.4508682Z + SHARD_NUMBER=2 2025-12-04T09:05:30.4508965Z + NUM_TEST_SHARDS=5 2025-12-04T09:05:30.4509273Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:05:30.4509657Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:05:30.4510170Z + export VALGRIND=ON 2025-12-04T09:05:30.4510424Z + VALGRIND=ON 2025-12-04T09:05:30.4510640Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *clang9* ]] 2025-12-04T09:05:30.4511121Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:05:30.4511378Z + detect_cuda_arch 2025-12-04T09:05:30.4511574Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:05:30.4511838Z + command -v nvidia-smi 2025-12-04T09:05:30.4512158Z /usr/bin/nvidia-smi 2025-12-04T09:05:30.4516534Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-12-04T09:05:30.4517583Z ++ tail -n 1 2025-12-04T09:05:30.4716077Z + TORCH_CUDA_ARCH_LIST=8.9 2025-12-04T09:05:30.4716516Z + export TORCH_CUDA_ARCH_LIST 2025-12-04T09:05:30.4716930Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *s390x* ]] 2025-12-04T09:05:30.4717556Z + [[ 0 == \1 ]] 2025-12-04T09:05:30.4717824Z + [[ True == \1 ]] 2025-12-04T09:05:30.4718048Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *bazel* ]] 2025-12-04T09:05:30.4721274Z ++ realpath build/custom_test_artifacts 2025-12-04T09:05:30.4730552Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-12-04T09:05:30.4731102Z + [[ -n '' ]] 2025-12-04T09:05:30.4731292Z + echo 'Environment variables' 2025-12-04T09:05:30.4731510Z Environment variables 2025-12-04T09:05:30.4731687Z + env 2025-12-04T09:05:30.4737618Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:05:30.4737972Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:05:30.4738369Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:30.4739044Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:05:30.4739412Z HOSTNAME=e29498c26bf7 2025-12-04T09:05:30.4740181Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4740637Z GITHUB_ACTION=__run_3 2025-12-04T09:05:30.4740948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:05:30.4741327Z GITHUB_RUN_NUMBER=158165 2025-12-04T09:05:30.4741658Z TEST_CONFIG=default 2025-12-04T09:05:30.4741977Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:05:30.4742383Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:05:30.4742792Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:05:30.4743315Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:05:30.4743688Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:05:30.4744044Z GITHUB_REF_TYPE=branch 2025-12-04T09:05:30.4744359Z TORCH_CUDA_ARCH_LIST=8.9 2025-12-04T09:05:30.4744754Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4745174Z XLA_CUDA= 2025-12-04T09:05:30.4745456Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:05:30.4746234Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:05:30.4746650Z *** 2025-12-04T09:05:30.4746920Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:05:30.4747281Z GITHUB_ACTIONS=true 2025-12-04T09:05:30.4747593Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:30.4748022Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:05:30.4748513Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4748986Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4749627Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main 2025-12-04T09:05:30.4750208Z UCC_HOME=/usr 2025-12-04T09:05:30.4750497Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:05:30.4750826Z VERBOSE_TEST_LOGS=False 2025-12-04T09:05:30.4751146Z GITHUB_REF=refs/heads/main 2025-12-04T09:05:30.4751461Z SHARD_NUMBER=2 2025-12-04T09:05:30.4751734Z GITHUB_REF_PROTECTED=true 2025-12-04T09:05:30.4752044Z HOME=/var/lib/jenkins 2025-12-04T09:05:30.4752388Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:05:30.4752787Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:05:30.4753228Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:05:30.4753664Z USE_SYSTEM_NCCL=1 2025-12-04T09:05:30.4753851Z NUM_TEST_SHARDS=5 2025-12-04T09:05:30.4753998Z UCX_HOME=/usr 2025-12-04T09:05:30.4754424Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4755124Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:30.4755990Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4756572Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:05:30.4757057Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:05:30.4757256Z DASHBOARD_TAG= 2025-12-04T09:05:30.4757414Z GITHUB_RUN_ID=19922768520 2025-12-04T09:05:30.4757598Z INSTALLED_OPENBLAS= 2025-12-04T09:05:30.4758038Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4758499Z GITHUB_ACTOR=huydhn 2025-12-04T09:05:30.4758657Z PR_NUMBER= 2025-12-04T09:05:30.4758804Z DESIRED_CUDA=12.8.1 2025-12-04T09:05:30.4758955Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:05:30.4759117Z VALGRIND=ON 2025-12-04T09:05:30.4759273Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:05:30.4759505Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:05:30.4759751Z TERM=vt100 2025-12-04T09:05:30.4760032Z INSTALLED_VISION=yes 2025-12-04T09:05:30.4760204Z BRANCH=main 2025-12-04T09:05:30.4760358Z SCCACHE_REGION=us-east-1 2025-12-04T09:05:30.4760544Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:05:30.4760736Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:05:30.4760922Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:05:30.4761290Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:05:30.4761694Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:05:30.4761951Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:05:30.4762193Z REENABLED_ISSUES= 2025-12-04T09:05:30.4762342Z DOCS= 2025-12-04T09:05:30.4762478Z SHLVL=1 2025-12-04T09:05:30.4762615Z MAX_JOBS=14 2025-12-04T09:05:30.4762758Z GITHUB_ACTOR_ID=475357 2025-12-04T09:05:30.4762997Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:30.4763267Z GITHUB_REF_NAME=main 2025-12-04T09:05:30.4763529Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:05:30.4763831Z GITHUB_JOB=test 2025-12-04T09:05:30.4763992Z NO_TEST_TIMEOUT=False 2025-12-04T09:05:30.4764165Z TD_DISTRIBUTED=False 2025-12-04T09:05:30.4764356Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:05:30.4764569Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:05:30.4764749Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:05:30.4764937Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:05:30.4765496Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:05:30.4766053Z GITHUB_BASE_REF= 2025-12-04T09:05:30.4766211Z INSTALLED_ACL= 2025-12-04T09:05:30.4766550Z ARTIFACTS_FILE_SUFFIX=test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T09:05:30.4766925Z CI=true 2025-12-04T09:05:30.4767071Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:05:30.4767302Z RUST_LOG=sccache::server=error 2025-12-04T09:05:30.4767486Z JOB_ID=57116084862 2025-12-04T09:05:30.4767661Z GITHUB_HEAD_REF= 2025-12-04T09:05:30.4767812Z GITHUB_ACTION_REF= 2025-12-04T09:05:30.4768026Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:05:30.4768272Z TEST_SHOWLOCALS=False 2025-12-04T09:05:30.4768436Z GITHUB_WORKFLOW=trunk 2025-12-04T09:05:30.4768613Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:05:30.4769045Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_68cc5a94-c390-4b58-b946-14203b5a4e58 2025-12-04T09:05:30.4769483Z NO_TD=False 2025-12-04T09:05:30.4769647Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:05:30.4769861Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:05:30.4770175Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:05:30.4770462Z _=/usr/bin/env 2025-12-04T09:05:30.4770619Z + echo 'Testing pytorch' 2025-12-04T09:05:30.4770798Z Testing pytorch 2025-12-04T09:05:30.4770958Z + export LANG=C.UTF-8 2025-12-04T09:05:30.4771123Z + LANG=C.UTF-8 2025-12-04T09:05:30.4771401Z + PR_NUMBER= 2025-12-04T09:05:30.4771565Z + [[ default == \d\e\f\a\u\l\t ]] 2025-12-04T09:05:30.4771774Z + export CUDA_VISIBLE_DEVICES=0 2025-12-04T09:05:30.4771972Z + CUDA_VISIBLE_DEVICES=0 2025-12-04T09:05:30.4772154Z + export HIP_VISIBLE_DEVICES=0 2025-12-04T09:05:30.4772421Z + HIP_VISIBLE_DEVICES=0 2025-12-04T09:05:30.4772603Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T09:05:30.4772808Z + [[ default == \s\l\o\w ]] 2025-12-04T09:05:30.4773058Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *slow-gradcheck* ]] 2025-12-04T09:05:30.4773373Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:05:30.4773632Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:05:30.4773874Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:05:30.4774087Z + [[ default == *crossref* ]] 2025-12-04T09:05:30.4774304Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:05:30.4774566Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:05:30.4774840Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:05:30.4775093Z + pip_install ninja==1.10.2 2025-12-04T09:05:30.4775347Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T09:05:30.4775691Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T09:05:31.2389876Z Collecting ninja==1.10.2 2025-12-04T09:05:31.2597517Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T09:05:31.2868142Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T09:05:31.6365854Z Installing collected packages: ninja 2025-12-04T09:05:31.6366381Z Attempting uninstall: ninja 2025-12-04T09:05:31.6373396Z Found existing installation: ninja 1.11.1.4 2025-12-04T09:05:31.6395630Z Uninstalling ninja-1.11.1.4: 2025-12-04T09:05:31.6516665Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T09:05:31.7130726Z Successfully installed ninja-1.10.2 2025-12-04T09:05:31.7567980Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:05:31.7569436Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:05:31.7570345Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:05:31.7570734Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *asan* ]] 2025-12-04T09:05:31.7571110Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-debug* ]] 2025-12-04T09:05:31.7571722Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:05:31.7572355Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass' 2025-12-04T09:05:31.7572983Z We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass 2025-12-04T09:05:31.7573416Z + cd test 2025-12-04T09:05:31.7573749Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T09:05:33.1731254Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T09:05:33.1731728Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T09:05:33.1732207Z + [[ default == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T09:05:33.1736367Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T09:05:33.1736727Z + [[ default == *pr_time_benchmarks* ]] 2025-12-04T09:05:33.1737022Z + [[ default == *dynamo_eager* ]] 2025-12-04T09:05:33.1737272Z + [[ default == *aot_eager* ]] 2025-12-04T09:05:33.1737515Z + [[ default == *aot_inductor* ]] 2025-12-04T09:05:33.1737790Z + [[ default == *max_autotune_inductor* ]] 2025-12-04T09:05:33.1738058Z + [[ default == *inductor* ]] 2025-12-04T09:05:33.1738295Z + [[ default == *dynamic* ]] 2025-12-04T09:05:33.1738522Z + [[ default == *cpu* ]] 2025-12-04T09:05:33.1738731Z + [[ default == *xpu* ]] 2025-12-04T09:05:33.1738986Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T09:05:33.1757224Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:05:33.1757636Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-bazel-* ]] 2025-12-04T09:05:33.1760543Z + cd test 2025-12-04T09:05:33.1761251Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T09:05:34.6320231Z PyTorch built with: 2025-12-04T09:05:34.6320529Z - GCC 11.4 2025-12-04T09:05:34.6320735Z - C++ Version: 201703 2025-12-04T09:05:34.6321265Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:05:34.6321936Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:05:34.6322326Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:05:34.6322636Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T09:05:34.6322929Z - NNPACK is enabled 2025-12-04T09:05:34.6323157Z - CPU capability usage: AVX2 2025-12-04T09:05:34.6323407Z - CUDA Runtime 12.8 2025-12-04T09:05:34.6323877Z - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_89,code=sm_89 2025-12-04T09:05:34.6324364Z - CuDNN 91.0.2 (built against CUDA 12.9) 2025-12-04T09:05:34.6328357Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=35b7a9a26c5923d98aebaa41a031dae21788a9ee, CUDA_VERSION=12.8, CUDNN_VERSION=9.10.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T09:05:34.6331918Z 2025-12-04T09:05:34.8978872Z + cd test 2025-12-04T09:05:34.8979293Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T09:05:36.0642371Z ATen/Parallel: 2025-12-04T09:05:36.0642687Z at::get_num_threads() : 8 2025-12-04T09:05:36.0642972Z at::get_num_interop_threads() : 8 2025-12-04T09:05:36.0643261Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:05:36.0643536Z omp_get_max_threads() : 8 2025-12-04T09:05:36.0644058Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:05:36.0644629Z mkl_get_max_threads() : 8 2025-12-04T09:05:36.0644974Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:05:36.0645372Z std::thread::hardware_concurrency() : 16 2025-12-04T09:05:36.0645651Z Environment variables: 2025-12-04T09:05:36.0645876Z OMP_NUM_THREADS : [not set] 2025-12-04T09:05:36.0646132Z MKL_NUM_THREADS : [not set] 2025-12-04T09:05:36.0646371Z ATen parallel backend: OpenMP 2025-12-04T09:05:36.0646529Z 2025-12-04T09:05:36.2945935Z + [[ default == *numpy_2* ]] 2025-12-04T09:05:36.2946330Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:05:36.2946679Z + [[ default == *backward* ]] 2025-12-04T09:05:36.2946977Z + [[ default == *libtorch_agnostic_targetting* ]] 2025-12-04T09:05:36.2947270Z + [[ default == *xla* ]] 2025-12-04T09:05:36.2947491Z + [[ default == *vllm* ]] 2025-12-04T09:05:36.2947725Z + [[ default == *executorch* ]] 2025-12-04T09:05:36.2947978Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T09:05:36.2948665Z + [[ default == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T09:05:36.2949035Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:05:36.2949365Z + [[ default == distributed ]] 2025-12-04T09:05:36.2949626Z + [[ default == *operator_benchmark* ]] 2025-12-04T09:05:36.2949915Z + [[ default == *operator_microbenchmark* ]] 2025-12-04T09:05:36.2950424Z + [[ default == *attention_microbenchmark* ]] 2025-12-04T09:05:36.2950708Z + [[ default == *inductor_distributed* ]] 2025-12-04T09:05:36.2950989Z + [[ default == *inductor-halide* ]] 2025-12-04T09:05:36.2951269Z + [[ default == *inductor-pallas* ]] 2025-12-04T09:05:36.2951578Z + [[ default == *inductor-triton-cpu* ]] 2025-12-04T09:05:36.2951880Z + [[ default == *inductor-micro-benchmark* ]] 2025-12-04T09:05:36.2952197Z + [[ default == *aoti_cross_compile_for_windows* ]] 2025-12-04T09:05:36.2952505Z + [[ default == *huggingface* ]] 2025-12-04T09:05:36.2952750Z + [[ default == *timm* ]] 2025-12-04T09:05:36.2952979Z + [[ default == cachebench ]] 2025-12-04T09:05:36.2953232Z + [[ default == verify_cachebench ]] 2025-12-04T09:05:36.2953482Z + [[ default == *torchbench* ]] 2025-12-04T09:05:36.2953740Z + [[ default == *inductor_cpp_wrapper* ]] 2025-12-04T09:05:36.2954017Z + [[ default == *inductor_core* ]] 2025-12-04T09:05:36.2954262Z + [[ default == *inductor* ]] 2025-12-04T09:05:36.2954505Z + [[ default == *einops* ]] 2025-12-04T09:05:36.2954748Z + [[ default == *dynamo_core* ]] 2025-12-04T09:05:36.2954995Z + [[ default == *dynamo_wrapped* ]] 2025-12-04T09:05:36.2955299Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:05:36.2955592Z + [[ 2 == 1 ]] 2025-12-04T09:05:36.2955767Z + [[ 2 == 2 ]] 2025-12-04T09:05:36.2955961Z + [[ 5 -gt 1 ]] 2025-12-04T09:05:36.2956161Z + install_torchvision 2025-12-04T09:05:36.2956376Z + local orig_preload 2025-12-04T09:05:36.2956583Z + local commit 2025-12-04T09:05:36.2956784Z ++ get_pinned_commit vision 2025-12-04T09:05:36.2957038Z ++ cat .github/ci_commit_pins/vision.txt 2025-12-04T09:05:36.2966837Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:05:36.2967288Z + orig_preload= 2025-12-04T09:05:36.2967490Z + '[' -n '' ']' 2025-12-04T09:05:36.2967781Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:05:36.2968069Z + export FORCE_CUDA=1 2025-12-04T09:05:36.2968328Z + FORCE_CUDA=1 2025-12-04T09:05:36.2968500Z + export WITH_CUDA=1 2025-12-04T09:05:36.2968669Z + WITH_CUDA=1 2025-12-04T09:05:36.2969083Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision 2025-12-04T09:05:36.2969723Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:05:36.2970126Z + local wheel_dir=dist/vision 2025-12-04T09:05:36.2970324Z + local found_whl=0 2025-12-04T09:05:36.2970508Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:05:36.2970716Z + [[ -f dist/vision/*.whl ]] 2025-12-04T09:05:36.2970900Z + '[' 0 == 0 ']' 2025-12-04T09:05:36.2971405Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:05:36.5813901Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:05:36.5817964Z Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-rn60iyfs 2025-12-04T09:05:36.5841298Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-rn60iyfs 2025-12-04T09:05:37.9492053Z Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e' 2025-12-04T09:05:37.9515057Z Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:05:38.0483368Z Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:05:39.8509343Z Preparing metadata (pyproject.toml) ... [?25l- \ | done 2025-12-04T09:05:39.8542901Z [?25hBuilding wheels for collected packages: torchvision 2025-12-04T09:06:51.6966651Z Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-12-04T09:06:51.6995877Z [?25h Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1786644 sha256=d9c504dd778bc57c39d094deb7a120c77dff6453fc9c022eea9fa4143dbf79b1 2025-12-04T09:06:51.6997072Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac 2025-12-04T09:06:51.7031525Z Successfully built torchvision 2025-12-04T09:06:51.7968672Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:06:51.7969275Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:06:51.7969938Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl') 2025-12-04T09:06:51.7970356Z + local args 2025-12-04T09:06:51.7970719Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-12-04T09:06:51.7971149Z + for path in "${args[@]}" 2025-12-04T09:06:51.7971562Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl' 2025-12-04T09:06:51.7972175Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:06:51.7972866Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:06:52.0931172Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:06:52.1014085Z Installing collected packages: torchvision 2025-12-04T09:06:52.5121697Z Successfully installed torchvision-0.25.0a0+617079d 2025-12-04T09:06:52.5395949Z + '[' -n '' ']' 2025-12-04T09:06:52.5396269Z + test_python_shard 2 2025-12-04T09:06:52.5396506Z + [[ -z 5 ]] 2025-12-04T09:06:52.5397251Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 2 5 --verbose --upload-artifacts-while-running 2025-12-04T09:06:56.8538795Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2025-12-04T09:06:56.8996095Z Ignoring disabled issues: [''] 2025-12-04T09:06:56.9073536Z Found test times from artifacts 2025-12-04T09:06:56.9392189Z Found test times from artifacts 2025-12-04T09:06:56.9401453Z Running all tests 2025-12-04T09:06:56.9987745Z Running parallel tests on 1 processes 2025-12-04T09:06:56.9994724Z Name: tests to run (est. time: 273.21min) 2025-12-04T09:06:56.9995230Z Serial tests (127): 2025-12-04T09:06:56.9995617Z inductor/test_aot_inductor 2/4 2025-12-04T09:06:56.9996060Z dynamo/test_repros 1/1 2025-12-04T09:06:56.9996467Z inductor/test_flex_attention 2/6 2025-12-04T09:06:56.9996802Z inductor/test_cuda_select_algorithm 1/1 2025-12-04T09:06:56.9997144Z inductor/test_compile_subprocess 2/2 2025-12-04T09:06:56.9997448Z test_decomp 1/22 2025-12-04T09:06:56.9997665Z test_decomp 5/22 2025-12-04T09:06:56.9997868Z test_decomp 10/22 2025-12-04T09:06:56.9998073Z test_decomp 15/22 2025-12-04T09:06:56.9998299Z test_decomp 20/22 2025-12-04T09:06:56.9998518Z test_ci_sanity_check_fail 1/1 2025-12-04T09:06:56.9998761Z test_ops 3/9 2025-12-04T09:06:56.9998953Z test_ops 8/9 2025-12-04T09:06:56.9999196Z inductor/test_torchinductor_dynamic_shapes 4/4 2025-12-04T09:06:56.9999509Z inductor/test_torchinductor_opinfo 2/13 2025-12-04T09:06:56.9999753Z inductor/test_torchinductor_opinfo 7/13 2025-12-04T09:06:57.0000090Z inductor/test_torchinductor_opinfo 12/13 2025-12-04T09:06:57.0000324Z inductor/test_cuda_repro 1/1 2025-12-04T09:06:57.0000538Z inductor/test_compiled_autograd 1/2 2025-12-04T09:06:57.0000762Z inductor/test_layout_optim 1/1 2025-12-04T09:06:57.0001269Z dynamo/test_exc 1/1 2025-12-04T09:06:57.0001576Z inductor/test_aot_inductor_arrayref 1/2 2025-12-04T09:06:57.0001896Z inductor/test_halide 1/1 2025-12-04T09:06:57.0002100Z inductor/test_deterministic 1/3 2025-12-04T09:06:57.0002315Z dynamo/test_deque_reconstruct 1/1 2025-12-04T09:06:57.0002746Z inductor/test_inductor_annotations 1/1 2025-12-04T09:06:57.0003128Z inductor/test_compile_worker 1/1 2025-12-04T09:06:57.0003495Z dynamo/test_fx_passes_pre_grad 1/1 2025-12-04T09:06:57.0003850Z inductor/test_fp8 1/1 2025-12-04T09:06:57.0004139Z inductor/test_flex_flash 1/1 2025-12-04T09:06:57.0004344Z inductor/test_segmented_tree 1/1 2025-12-04T09:06:57.0004561Z inductor/test_kernel_optimization 1/1 2025-12-04T09:06:57.0004931Z inductor/test_metrics 1/1 2025-12-04T09:06:57.0005142Z export/test_unflatten_training_ir 1/1 2025-12-04T09:06:57.0005357Z inductor/test_triton_kernels 1/1 2025-12-04T09:06:57.0005565Z inductor/test_lookup_table 1/1 2025-12-04T09:06:57.0005813Z inductor/test_cutedsl_template 1/1 2025-12-04T09:06:57.0006112Z inductor/test_benchmark_fusion 1/1 2025-12-04T09:06:57.0006463Z export/test_serdes 1/1 2025-12-04T09:06:57.0006779Z inductor/test_control_deps 1/1 2025-12-04T09:06:57.0007132Z inductor/test_benchmarking 1/1 2025-12-04T09:06:57.0007366Z inductor/test_helion_kernels 1/1 2025-12-04T09:06:57.0007575Z inductor/test_quantization 1/1 2025-12-04T09:06:57.0007780Z inductor/test_best_config 1/1 2025-12-04T09:06:57.0007973Z export/test_tools 1/1 2025-12-04T09:06:57.0008171Z inductor/test_compiled_optimizers 1/3 2025-12-04T09:06:57.0008410Z inductor/test_aot_inductor_custom_ops 1/1 2025-12-04T09:06:57.0008632Z inductor/test_control_flow 4/5 2025-12-04T09:06:57.0008837Z dynamo/test_cudagraphs 1/1 2025-12-04T09:06:57.0009040Z inductor/test_alignment 1/1 2025-12-04T09:06:57.0009268Z dynamo/test_guard_serialization 1/1 2025-12-04T09:06:57.0009500Z inductor/test_needs_exact_strides 1/1 2025-12-04T09:06:57.0009734Z inductor/test_auto_functionalize 1/1 2025-12-04T09:06:57.0009954Z dynamo/test_modes 1/1 2025-12-04T09:06:57.0010152Z inductor/test_custom_partitioner_fn 1/1 2025-12-04T09:06:57.0010381Z dynamo/test_debug_utils 1/1 2025-12-04T09:06:57.0010583Z dynamo/test_base_hop 1/1 2025-12-04T09:06:57.0010776Z dynamo/test_export 1/1 2025-12-04T09:06:57.0010978Z dynamo/test_python_dispatcher 1/1 2025-12-04T09:06:57.0011205Z export/test_swap 1/1 2025-12-04T09:06:57.0011389Z export/test_unflatten 1/1 2025-12-04T09:06:57.0011593Z dynamo/test_verify_correctness 1/1 2025-12-04T09:06:57.0011846Z dynamo/test_wrap_inductor_compiled_regions 1/1 2025-12-04T09:06:57.0012110Z dynamo/test_cudagraphs_expandable_segments 1/1 2025-12-04T09:06:57.0012346Z inductor/test_caching 1/1 2025-12-04T09:06:57.0012542Z dynamo/test_reorder_logs 1/1 2025-12-04T09:06:57.0012742Z dynamo/test_subclasses 1/1 2025-12-04T09:06:57.0012931Z dynamo/test_comptime 1/1 2025-12-04T09:06:57.0013135Z test_privateuseone_python_backend 1/1 2025-12-04T09:06:57.0013367Z functorch/test_rearrange 1/1 2025-12-04T09:06:57.0013563Z functorch/test_parsing 1/1 2025-12-04T09:06:57.0013753Z test_varlen_attention 1/1 2025-12-04T09:06:57.0013941Z test_mkl_verbose 1/1 2025-12-04T09:06:57.0014121Z test_cpp_api_parity 1/1 2025-12-04T09:06:57.0014304Z test_autoload 1/1 2025-12-04T09:06:57.0014491Z nn/attention/test_open_registry 1/1 2025-12-04T09:06:57.0014700Z xpu/test_fusion 1/1 2025-12-04T09:06:57.0014872Z test_foreach 1/1 2025-12-04T09:06:57.0015036Z test_pytree 1/1 2025-12-04T09:06:57.0015204Z test_namedtuple_return_api 1/1 2025-12-04T09:06:57.0015425Z profiler/test_record_function 1/1 2025-12-04T09:06:57.0015655Z test_compile_benchmark_util 1/1 2025-12-04T09:06:57.0015893Z test_set_default_mobile_cpu_allocator 1/1 2025-12-04T09:06:57.0016128Z test_fake_tensor 1/1 2025-12-04T09:06:57.0016308Z test_binary_ufuncs 1/1 2025-12-04T09:06:57.0016599Z test_meta 2/4 2025-12-04T09:06:57.0016757Z test_fx 1/1 2025-12-04T09:06:57.0016919Z test_ops_gradients 2/4 2025-12-04T09:06:57.0017292Z test_nestedtensor 3/4 2025-12-04T09:06:57.0017479Z functorch/test_control_flow 4/4 2025-12-04T09:06:57.0017705Z complex_tensor/test_complex_tensor 3/3 2025-12-04T09:06:57.0018084Z optim/test_optim 1/1 2025-12-04T09:06:57.0018282Z torch_np/numpy_tests/fft/test_pocketfft 1/1 2025-12-04T09:06:57.0018515Z functorch/test_ops 1/9 2025-12-04T09:06:57.0018699Z functorch/test_ops 6/9 2025-12-04T09:06:57.0018910Z torch_np/numpy_tests/core/test_getlimits 1/1 2025-12-04T09:06:57.0019155Z torch_np/test_ndarray_methods 1/1 2025-12-04T09:06:57.0019364Z test_view_ops 1/1 2025-12-04T09:06:57.0019534Z test_nn 1/1 2025-12-04T09:06:57.0019729Z torch_np/numpy_tests/lib/test_index_tricks 1/1 2025-12-04T09:06:57.0019969Z test_jit_autocast 1/1 2025-12-04T09:06:57.0020150Z nn/test_pooling 1/1 2025-12-04T09:06:57.0020326Z nn/test_embedding 1/1 2025-12-04T09:06:57.0020520Z test_xnnpack_integration 1/1 2025-12-04T09:06:57.0020724Z test_cuda_trace 1/1 2025-12-04T09:06:57.0020889Z test_native_mha 1/1 2025-12-04T09:06:57.0021092Z torch_np/numpy_tests/core/test_numerictypes 1/1 2025-12-04T09:06:57.0021347Z test_cuda_nvml_based_avail 1/1 2025-12-04T09:06:57.0021562Z test_function_schema 1/1 2025-12-04T09:06:57.0021752Z test_accelerator 1/1 2025-12-04T09:06:57.0021925Z nn/test_init 1/1 2025-12-04T09:06:57.0022120Z torch_np/numpy_tests/core/test_scalar_methods 1/1 2025-12-04T09:06:57.0022379Z torch_np/numpy_tests/fft/test_helper 1/1 2025-12-04T09:06:57.0022606Z test_mobile_optimizer 1/1 2025-12-04T09:06:57.0022791Z test_overrides 1/1 2025-12-04T09:06:57.0022975Z torch_np/test_function_base 1/1 2025-12-04T09:06:57.0023183Z test_type_promotion 1/1 2025-12-04T09:06:57.0023373Z torch_np/test_scalars_0D_arrays 1/1 2025-12-04T09:06:57.0023586Z test_cuda_primary_ctx 1/1 2025-12-04T09:06:57.0023785Z profiler/test_profiler_tree 1/1 2025-12-04T09:06:57.0024016Z torch_np/numpy_tests/lib/test_arraysetops 1/1 2025-12-04T09:06:57.0024239Z test_dlpack 1/1 2025-12-04T09:06:57.0024409Z profiler/test_torch_tidy 1/1 2025-12-04T09:06:57.0024606Z lazy/test_reuse_ir 1/1 2025-12-04T09:06:57.0024804Z test_functional_autograd_benchmark 1/1 2025-12-04T09:06:57.0025024Z test_reductions 1/1 2025-12-04T09:06:57.0025198Z test_autoload_enable 1/1 2025-12-04T09:06:57.0025378Z Parallel tests (0): 2025-12-04T09:06:57.0025572Z Name: excluded (est. time: 0.0min) 2025-12-04T09:06:57.0025778Z Serial tests (0): 2025-12-04T09:06:57.0025936Z Parallel tests (0): 2025-12-04T09:06:57.0026216Z Running inductor/test_aot_inductor 2/4 ... [2025-12-04 09:06:57.000103][856.928401149] 2025-12-04T09:06:57.0026552Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:06:57.0027311Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:06:57.000433] 2025-12-04T09:15:38.7959112Z 2025-12-04T09:15:38.7960240Z inductor/test_aot_inductor 2/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_2.4_15a925ff16cb0669_.log 2025-12-04T09:15:38.8033583Z Running 227 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_no_compile_standalone, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_amp_fallback_random_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_fp8_dtype_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_sym_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_user_defined_triton_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_3_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_cpu_predicate_cuda_operands_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_cpu_predicate_cuda_operands_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicate_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_parameters_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fill__fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_dynamic_dim_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_on_disk_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_libtorch_free_so_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_missing_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_none_args_aot_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_hann_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_grouped_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_with_unbacked_add_expr_transitive_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_expr_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_i64_input_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_user_managed_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_nested_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_profiler_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_grid_with_unbacked_symbols_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_bool_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_boolean_indexing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_codegen_int_array_var_fix_memory_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_disable_one_pass_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv3d_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_custom_op_in_subgraph_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_cat_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_scalar_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_embedding_bag_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fft_c2c_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_issue_140766_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_missing_cubin_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_mixed_device_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_model_modified_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multi_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_contiguous_output_alias_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_normal_functional_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_calling_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_fp8_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_grouped_mm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_shifted_constraint_ranges_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_i64_input_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symfloat_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_bool_param_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_profiler_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_sym_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_tensor_meta_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bool_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_boolean_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_and_force_mmap_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_disable_one_pass_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_conv3d_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_conv_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_device_moved_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_duplicate_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_cat_dtype_promotion_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_view_of_param_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_linear_dynamic_maxautotune_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_cubin_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_model_modified_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_multiple_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_poi_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_profile_benchmark_harness_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_calling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symint_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_dynamic_launcher_grid_infer_from_tensor_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_dynamic_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_extern_kernel_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_expr_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_fn_like_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps 2025-12-04T09:15:38.8104638Z 2025-12-04T09:15:38.8104878Z Finished inductor/test_aot_inductor 2/4 ... [2025-12-04 09:15:38.796056][1378.72435061], took 8.70min 2025-12-04T09:15:38.8105763Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d77224b10dd1e10b.xml 2025-12-04T09:15:39.2309808Z Uploading artifacts took 0.14 seconds 2025-12-04T09:15:39.2312567Z Running dynamo/test_repros 1/1 ... [2025-12-04 09:15:39.231027][1379.159323701] 2025-12-04T09:15:39.2312947Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:15:39.2315647Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_repros.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:15:39.231326] 2025-12-04T09:17:34.2826104Z 2025-12-04T09:17:34.2827263Z dynamo/test_repros 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_repros_1.1_21fd2d3c0d4dd552_.log 2025-12-04T09:17:34.2895714Z Running 351 items in this shard: test/dynamo/test_repros.py::LRUCacheWarningTests::test_lru_cache_warning_issued_during_tracing, test/dynamo/test_repros.py::ReproTests::test_312_local_cell_overlap, test/dynamo/test_repros.py::ReproTests::test_Size, test/dynamo/test_repros.py::ReproTests::test_abc_setattr, test/dynamo/test_repros.py::ReproTests::test_add_complex_conj, test/dynamo/test_repros.py::ReproTests::test_add_sub_alpha_out, test/dynamo/test_repros.py::ReproTests::test_addr_alpha_beta_out, test/dynamo/test_repros.py::ReproTests::test_amp_foreach_fake_impl, test/dynamo/test_repros.py::ReproTests::test_aot_autograd_runtime_wrapper_prologue_profiled, test/dynamo/test_repros.py::ReproTests::test_as_strided_on_base_with_mutation_works, test/dynamo/test_repros.py::ReproTests::test_as_strided_on_existing_view_banned, test/dynamo/test_repros.py::ReproTests::test_attached_attribute_in_dir, test/dynamo/test_repros.py::ReproTests::test_autograd_function_graph_break, test/dynamo/test_repros.py::ReproTests::test_avoid_dupe_specialization, test/dynamo/test_repros.py::ReproTests::test_batch_encoding_clone_inputs, test/dynamo/test_repros.py::ReproTests::test_batch_norm_act, test/dynamo/test_repros.py::ReproTests::test_batchnorm_e2e, test/dynamo/test_repros.py::ReproTests::test_bigbird_unsqueeze_inplace, test/dynamo/test_repros.py::ReproTests::test_bitwise_op_guard, test/dynamo/test_repros.py::ReproTests::test_bitwise_print_precedence, test/dynamo/test_repros.py::ReproTests::test_boxes_len, test/dynamo/test_repros.py::ReproTests::test_build_map_unpack_with_call, test/dynamo/test_repros.py::ReproTests::test_c_defined_metaclass, test/dynamo/test_repros.py::ReproTests::test_cells_unsupported_step_exception, test/dynamo/test_repros.py::ReproTests::test_changing_stride, test/dynamo/test_repros.py::ReproTests::test_chunk_reformer_ff, test/dynamo/test_repros.py::ReproTests::test_class_member, test/dynamo/test_repros.py::ReproTests::test_classmethod_with_slots, test/dynamo/test_repros.py::ReproTests::test_clone_not_memory_dense, test/dynamo/test_repros.py::ReproTests::test_compilation_metrics_on_error, test/dynamo/test_repros.py::ReproTests::test_compile_complex_conj, test/dynamo/test_repros.py::ReproTests::test_compile_copy__int_overload, test/dynamo/test_repros.py::ReproTests::test_compiled_module_truthiness, test/dynamo/test_repros.py::ReproTests::test_const_dict_keyerror, test/dynamo/test_repros.py::ReproTests::test_contains_range_constprop, test/dynamo/test_repros.py::ReproTests::test_convert_boxes_to_pooler_format, test/dynamo/test_repros.py::ReproTests::test_copy_weird_strides, test/dynamo/test_repros.py::ReproTests::test_create_rand_mask_from_inputs, test/dynamo/test_repros.py::ReproTests::test_dalle2_maybe, test/dynamo/test_repros.py::ReproTests::test_data_attr_mutation_after_saved_for_bw, test/dynamo/test_repros.py::ReproTests::test_dataclass_in_module, test/dynamo/test_repros.py::ReproTests::test_dataclass_init_with_default_factory_with_inputs, test/dynamo/test_repros.py::ReproTests::test_ddp_checkpoint, test/dynamo/test_repros.py::ReproTests::test_dedup_global, test/dynamo/test_repros.py::ReproTests::test_deferred_runtime_asserts, test/dynamo/test_repros.py::ReproTests::test_delattr, test/dynamo/test_repros.py::ReproTests::test_delattr_raises, test/dynamo/test_repros.py::ReproTests::test_delattr_return, test/dynamo/test_repros.py::ReproTests::test_delete_local_error, test/dynamo/test_repros.py::ReproTests::test_deleted_compile_wrapper_segfault, test/dynamo/test_repros.py::ReproTests::test_delsubscr, test/dynamo/test_repros.py::ReproTests::test_delsubscr_raises, test/dynamo/test_repros.py::ReproTests::test_detectron2_instances_cat, test/dynamo/test_repros.py::ReproTests::test_disabling_unpack_hooks_within_compiled_region, test/dynamo/test_repros.py::ReproTests::test_distributions_subclass, test/dynamo/test_repros.py::ReproTests::test_do_paste_mask, test/dynamo/test_repros.py::ReproTests::test_dont_aggressively_write_assert, test/dynamo/test_repros.py::ReproTests::test_dont_dce_rand, test/dynamo/test_repros.py::ReproTests::test_dropout_inline, test/dynamo/test_repros.py::ReproTests::test_dynamic_shape_disable_duck_size, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_double_not_equal, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_float_guard, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_implicit_guard, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_right_side, test/dynamo/test_repros.py::ReproTests::test_dynamo_default_lru_cache_behavior, test/dynamo/test_repros.py::ReproTests::test_dynamo_disable_lru_cache_behavior, test/dynamo/test_repros.py::ReproTests::test_dynamo_set_recursion_limit, test/dynamo/test_repros.py::ReproTests::test_dynamo_set_recursion_limit_usage, test/dynamo/test_repros.py::ReproTests::test_ellipsis, test/dynamo/test_repros.py::ReproTests::test_embedding_backward_broadcasting_decomp, test/dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_False, test/dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_True, test/dynamo/test_repros.py::ReproTests::test_empty_list_contains_with_jump, test/dynamo/test_repros.py::ReproTests::test_empty_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_enum, test/dynamo/test_repros.py::ReproTests::test_ephemeral_module, test/dynamo/test_repros.py::ReproTests::test_error_return_without_exception_set, test/dynamo/test_repros.py::ReproTests::test_exception_in_dynamo_handling, test/dynamo/test_repros.py::ReproTests::test_exec_import, test/dynamo/test_repros.py::ReproTests::test_exec_wildcard_import, test/dynamo/test_repros.py::ReproTests::test_export_vs_dynamo_for_multiheadattention, test/dynamo/test_repros.py::ReproTests::test_flip_bad_accuracy, test/dynamo/test_repros.py::ReproTests::test_for_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_for_loop_graph_break_before, test/dynamo/test_repros.py::ReproTests::test_foreach_decomp_arg_names, test/dynamo/test_repros.py::ReproTests::test_fsdp_set_input_mutation_applied_when_input_gets_no_gradients, test/dynamo/test_repros.py::ReproTests::test_function_in_skipfiles, test/dynamo/test_repros.py::ReproTests::test_functools_wraps, test/dynamo/test_repros.py::ReproTests::test_gan_repro_trying_to_backward_through_the_graph_a_second_time, test/dynamo/test_repros.py::ReproTests::test_generator_dealloc, test/dynamo/test_repros.py::ReproTests::test_get_parameter_dtype, test/dynamo/test_repros.py::ReproTests::test_get_type_hints, test/dynamo/test_repros.py::ReproTests::test_global_fn_mutation, test/dynamo/test_repros.py::ReproTests::test_grad, test/dynamo/test_repros.py::ReproTests::test_grad_mode_carrying_correct_state_after_graph_break, test/dynamo/test_repros.py::ReproTests::test_grad_references_cleared, test/dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance, test/dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance_pep585, test/dynamo/test_repros.py::ReproTests::test_graph_break_unsupported_fake, test/dynamo/test_repros.py::ReproTests::test_guard_default_device, test/dynamo/test_repros.py::ReproTests::test_guard_fail_nested_tuple, test/dynamo/test_repros.py::ReproTests::test_guard_fail_tensor_bool, test/dynamo/test_repros.py::ReproTests::test_guard_ordering_shape_fail, test/dynamo/test_repros.py::ReproTests::test_guard_same_frame_fail_message, test/dynamo/test_repros.py::ReproTests::test_guard_with_tuple_mutation, test/dynamo/test_repros.py::ReproTests::test_hasattr_builtin, test/dynamo/test_repros.py::ReproTests::test_hf_bigbird_unsqueeze, test/dynamo/test_repros.py::ReproTests::test_hf_classinstantier, test/dynamo/test_repros.py::ReproTests::test_hf_gelu_inline, test/dynamo/test_repros.py::ReproTests::test_hf_model_output, test/dynamo/test_repros.py::ReproTests::test_hf_t5_forward, test/dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_inference, test/dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_training, test/dynamo/test_repros.py::ReproTests::test_iadd_graph_break, test/dynamo/test_repros.py::ReproTests::test_incompatible_configs, test/dynamo/test_repros.py::ReproTests::test_indexing_with_list, test/dynamo/test_repros.py::ReproTests::test_inductor_dynamic_shapes_broadcasting, test/dynamo/test_repros.py::ReproTests::test_inductor_no_recursionerror_on_for_loops, test/dynamo/test_repros.py::ReproTests::test_inductor_rng_default_dtype, test/dynamo/test_repros.py::ReproTests::test_inference_mode_dynamic_shapes, test/dynamo/test_repros.py::ReproTests::test_inlining_cornercase, test/dynamo/test_repros.py::ReproTests::test_inplace_unsqueeze_input, test/dynamo/test_repros.py::ReproTests::test_int_format, test/dynamo/test_repros.py::ReproTests::test_intermediate_leaf_requires_grad, test/dynamo/test_repros.py::ReproTests::test_invalid_seq_unpack, test/dynamo/test_repros.py::ReproTests::test_is_make_fx_tracing, test/dynamo/test_repros.py::ReproTests::test_is_symbolic_tracing, test/dynamo/test_repros.py::ReproTests::test_isinstance_dtype, test/dynamo/test_repros.py::ReproTests::test_isinstance_storage, test/dynamo/test_repros.py::ReproTests::test_issue111522, test/dynamo/test_repros.py::ReproTests::test_issue111918, test/dynamo/test_repros.py::ReproTests::test_issue114171, test/dynamo/test_repros.py::ReproTests::test_issue126128, test/dynamo/test_repros.py::ReproTests::test_issue134451, test/dynamo/test_repros.py::ReproTests::test_issue1466_size_aot_autograd, test/dynamo/test_repros.py::ReproTests::test_issue164247_backend_eager, test/dynamo/test_repros.py::ReproTests::test_issue164247_backend_inductor, test/dynamo/test_repros.py::ReproTests::test_issue175, test/dynamo/test_repros.py::ReproTests::test_jit_script_defaults, test/dynamo/test_repros.py::ReproTests::test_jit_trace_errors, test/dynamo/test_repros.py::ReproTests::test_kwargs_out_list_variable, test/dynamo/test_repros.py::ReproTests::test_list_aliasing, test/dynamo/test_repros.py::ReproTests::test_list_index, test/dynamo/test_repros.py::ReproTests::test_list_index_not_found, test/dynamo/test_repros.py::ReproTests::test_list_index_tensor_unsupported, test/dynamo/test_repros.py::ReproTests::test_list_reverse, test/dynamo/test_repros.py::ReproTests::test_list_self_reference, test/dynamo/test_repros.py::ReproTests::test_listcomp, test/dynamo/test_repros.py::ReproTests::test_longformer_chunk, test/dynamo/test_repros.py::ReproTests::test_longtensor_list, test/dynamo/test_repros.py::ReproTests::test_lru_cache_tracing, test/dynamo/test_repros.py::ReproTests::test_maml_item_capture, test/dynamo/test_repros.py::ReproTests::test_maml_no_item_capture, test/dynamo/test_repros.py::ReproTests::test_many_overlapping_inputs_does_not_explode_guards, test/dynamo/test_repros.py::ReproTests::test_many_views_with_mutation, test/dynamo/test_repros.py::ReproTests::test_map_with_multiple_args, test/dynamo/test_repros.py::ReproTests::test_maybe_multiply_symint, test/dynamo/test_repros.py::ReproTests::test_mem_leak_guards, test/dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list1, test/dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list2, test/dynamo/test_repros.py::ReproTests::test_method_overriding, test/dynamo/test_repros.py::ReproTests::test_module_in_skipfiles, test/dynamo/test_repros.py::ReproTests::test_modules, test/dynamo/test_repros.py::ReproTests::test_multi_dot_import, test/dynamo/test_repros.py::ReproTests::test_multi_import, test/dynamo/test_repros.py::ReproTests::test_named_buffers, test/dynamo/test_repros.py::ReproTests::test_nanmean_out, test/dynamo/test_repros.py::ReproTests::test_negative_floor_div_solve, test/dynamo/test_repros.py::ReproTests::test_negative_shape_guard, test/dynamo/test_repros.py::ReproTests::test_nested_while_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_nn_module_callable, test/dynamo/test_repros.py::ReproTests::test_nn_module_property_closure, test/dynamo/test_repros.py::ReproTests::test_nn_module_stack_bc, test/dynamo/test_repros.py::ReproTests::test_nn_param_freevar_codegen, test/dynamo/test_repros.py::ReproTests::test_nn_parameter, test/dynamo/test_repros.py::ReproTests::test_nn_parameter_ctor_graph_breaks, test/dynamo/test_repros.py::ReproTests::test_nn_parametrize, test/dynamo/test_repros.py::ReproTests::test_no_grad_inline, test/dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame, test/dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame_ctx_manager, test/dynamo/test_repros.py::ReproTests::test_nonconst_issubclass, test/dynamo/test_repros.py::ReproTests::test_not_rewrite_assert_for_other_errors, test/dynamo/test_repros.py::ReproTests::test_nullcontext1, test/dynamo/test_repros.py::ReproTests::test_nullcontext2, test/dynamo/test_repros.py::ReproTests::test_numpy_not_ndarray_recompiles, test/dynamo/test_repros.py::ReproTests::test_numpy_tobytes_no_error, test/dynamo/test_repros.py::ReproTests::test_odict_get_item_index_name, test/dynamo/test_repros.py::ReproTests::test_omegaconf_dictconfig, test/dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_contains, test/dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_iter, test/dynamo/test_repros.py::ReproTests::test_ones_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_optim_state_references_cleared, test/dynamo/test_repros.py::ReproTests::test_optimized_deepcopy, test/dynamo/test_repros.py::ReproTests::test_optimized_module_patched_init, test/dynamo/test_repros.py::ReproTests::test_optimized_module_training, test/dynamo/test_repros.py::ReproTests::test_os_fspath, test/dynamo/test_repros.py::ReproTests::test_out_nested_cell_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_nested_cell_tuple_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_none, test/dynamo/test_repros.py::ReproTests::test_out_overload_non_contiguous, test/dynamo/test_repros.py::ReproTests::test_out_root_cell_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_root_cell_tuple_shape_change, test/dynamo/test_repros.py::ReproTests::test_output_aliases_intermediate, test/dynamo/test_repros.py::ReproTests::test_overlapping_inputs_with_dynamic_shapes_error, test/dynamo/test_repros.py::ReproTests::test_overwriting_params, test/dynamo/test_repros.py::ReproTests::test_partially_initialized_module_property, test/dynamo/test_repros.py::ReproTests::test_partitioner_activation_memory_budget_with_unbacked_symints, test/dynamo/test_repros.py::ReproTests::test_partitioner_cse_respects_mutation_boundaries, test/dynamo/test_repros.py::ReproTests::test_pointless_graph_removal, test/dynamo/test_repros.py::ReproTests::test_preserve_stride_with_clone, test/dynamo/test_repros.py::ReproTests::test_primtorch, test/dynamo/test_repros.py::ReproTests::test_primtorch_no_graph_break, test/dynamo/test_repros.py::ReproTests::test_randint_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_recursive_map, test/dynamo/test_repros.py::ReproTests::test_reformer_eval, test/dynamo/test_repros.py::ReproTests::test_reformer_min_chunk_len, test/dynamo/test_repros.py::ReproTests::test_reformer_sorting, test/dynamo/test_repros.py::ReproTests::test_reformer_train, test/dynamo/test_repros.py::ReproTests::test_reinplacing, test/dynamo/test_repros.py::ReproTests::test_relative_import, test/dynamo/test_repros.py::ReproTests::test_relative_import_no_modulename, test/dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode1, test/dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode2, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass1, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass2, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass3, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_mixed_grad, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_scalar, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_tensor, test/dynamo/test_repros.py::ReproTests::test_return_weakref, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_dont_change_bytecode, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_noop, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_msg, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_non_string_msg, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_without_msg, test/dynamo/test_repros.py::ReproTests::test_rng_state, test/dynamo/test_repros.py::ReproTests::test_seq_append_list, test/dynamo/test_repros.py::ReproTests::test_setattr_requires_grad_graph_breaks, test/dynamo/test_repros.py::ReproTests::test_setitem_boolean_mask_diff, test/dynamo/test_repros.py::ReproTests::test_setitem_tensor_prop, test/dynamo/test_repros.py::ReproTests::test_setitem_tuple_boolean_mask_diff, test/dynamo/test_repros.py::ReproTests::test_sigmoid_out, test/dynamo/test_repros.py::ReproTests::test_sigmoid_out2, test/dynamo/test_repros.py::ReproTests::test_size_typematch, test/dynamo/test_repros.py::ReproTests::test_slice_into_list_mutable, test/dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape, test/dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape_setitem, test/dynamo/test_repros.py::ReproTests::test_sort_out, test/dynamo/test_repros.py::ReproTests::test_sort_out2, test/dynamo/test_repros.py::ReproTests::test_specialized_stride, test/dynamo/test_repros.py::ReproTests::test_split_with_sizes_aot_autograd, test/dynamo/test_repros.py::ReproTests::test_staticmethod_allow_in_graph, test/dynamo/test_repros.py::ReproTests::test_stk_sdd_is_transposed, test/dynamo/test_repros.py::ReproTests::test_stop_iteration_reconstruct, test/dynamo/test_repros.py::ReproTests::test_str_isalnum, test/dynamo/test_repros.py::ReproTests::test_string_format, test/dynamo/test_repros.py::ReproTests::test_subclass_graph_output_repro, test/dynamo/test_repros.py::ReproTests::test_super_classmethod, test/dynamo/test_repros.py::ReproTests::test_super_classmethod_inheritance, test/dynamo/test_repros.py::ReproTests::test_super_diamond, test/dynamo/test_repros.py::ReproTests::test_super_in_staticmethod, test/dynamo/test_repros.py::ReproTests::test_super_staticmethod, test/dynamo/test_repros.py::ReproTests::test_swin_base_tensor_attr, test/dynamo/test_repros.py::ReproTests::test_symint_bitwise, test/dynamo/test_repros.py::ReproTests::test_symnode_is_not_op, test/dynamo/test_repros.py::ReproTests::test_symnode_is_op, test/dynamo/test_repros.py::ReproTests::test_sys_monitoring, test/dynamo/test_repros.py::ReproTests::test_tensor_data_kwarg, test/dynamo/test_repros.py::ReproTests::test_tensor_isinstance_tuple, test/dynamo/test_repros.py::ReproTests::test_tensor_item, test/dynamo/test_repros.py::ReproTests::test_tensor_random, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_mismatched_dtype, test/dynamo/test_repros.py::ReproTests::test_tensor_split, test/dynamo/test_repros.py::ReproTests::test_tensor_split_within_device_cm, test/dynamo/test_repros.py::ReproTests::test_tensor_uniform, test/dynamo/test_repros.py::ReproTests::test_threading_local, test/dynamo/test_repros.py::ReproTests::test_tokenization, test/dynamo/test_repros.py::ReproTests::test_torch_compile_in_compile_frame, test/dynamo/test_repros.py::ReproTests::test_torch_ops_aten, test/dynamo/test_repros.py::ReproTests::test_torch_tensor_ops, test/dynamo/test_repros.py::ReproTests::test_torch_tensor_ops_no_graph_break, test/dynamo/test_repros.py::ReproTests::test_torch_variable_type, test/dynamo/test_repros.py::ReproTests::test_torchname, test/dynamo/test_repros.py::ReproTests::test_trace_functional_tensor_with, test/dynamo/test_repros.py::ReproTests::test_tuple_enum_as_key_dict, test/dynamo/test_repros.py::ReproTests::test_typed_dict, test/dynamo/test_repros.py::ReproTests::test_typed_dict_total, test/dynamo/test_repros.py::ReproTests::test_udf_classes_reconstruction, test/dynamo/test_repros.py::ReproTests::test_unbacked_arange_in_bounds, test/dynamo/test_repros.py::ReproTests::test_unbind_copy_out, test/dynamo/test_repros.py::ReproTests::test_unpack_hooks_can_be_disabled, test/dynamo/test_repros.py::ReproTests::test_unpack_hooks_dont_run_during_tracing, test/dynamo/test_repros.py::ReproTests::test_unspecialized_nn_module_with_torch_variable_attribute, test/dynamo/test_repros.py::ReproTests::test_unsqueeze_mul_strides, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init_graph_break, test/dynamo/test_repros.py::ReproTests::test_user_defined_iter, test/dynamo/test_repros.py::ReproTests::test_user_defined_object_callable, test/dynamo/test_repros.py::ReproTests::test_validate_model_kwargs, test/dynamo/test_repros.py::ReproTests::test_vc_bumped_in_inference_graph, test/dynamo/test_repros.py::ReproTests::test_vdd_duplicate_error, test/dynamo/test_repros.py::ReproTests::test_view_dtype_overload, test/dynamo/test_repros.py::ReproTests::test_weakref, test/dynamo/test_repros.py::ReproTests::test_weakref_callback, test/dynamo/test_repros.py::ReproTests::test_weakref_construction, test/dynamo/test_repros.py::ReproTests::test_weakref_del, test/dynamo/test_repros.py::ReproTests::test_weakref_proxy, test/dynamo/test_repros.py::ReproTests::test_weakref_reconstruct, test/dynamo/test_repros.py::ReproTests::test_while_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_while_loop_graph_break_inside_call_function, test/dynamo/test_repros.py::ReproTests::test_with_on_graph_break_inst, test/dynamo/test_repros.py::ReproTests::test_with_on_graph_break_nested, test/dynamo/test_repros.py::ReproTests::test_zeros_out_dynamic, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_cuda_sync_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_current_accelerator_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_data_dependent_error_log_no_print_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_deepcopy_constant_tensor_in_aot_bwd_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_safe_grad_warning_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_user_warnings_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_warnings_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_flash_attn_backward_mixed_strides_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_getattr_return_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_guard_default_device_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_megablocks_moe_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_memleak_when_graph_input_has_tensor_attr_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_module_attribute_error_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_named_tuple_vt_clone_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_norm_dtype_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_partial_export_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_partitioner_saves_weights_for_bw_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_get_node_type_not_traced_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_get_node_type_with_namedtuple_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_tree_is_leaf_not_traced_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_tree_is_leaf_with_namedtuple_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sdpa_dynamic_shapes_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sub_alpha_scalar_repro_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_tensor_size_hasattr_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_torch_cuda_is_initialized_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_truthiness_of_symints_no_recompiles_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_udf_class_source_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_zero_dim_param_mixed_device_grad_cuda 2025-12-04T09:17:34.2964079Z 2025-12-04T09:17:34.2964304Z Finished dynamo/test_repros 1/1 ... [2025-12-04 09:17:34.282856][1494.211150867], took 1.92min 2025-12-04T09:17:34.2965014Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-87366e2d7057b5b0.xml 2025-12-04T09:17:34.4077811Z Running inductor/test_flex_attention 2/6 ... [2025-12-04 09:17:34.407532][1494.335830862] 2025-12-04T09:17:34.4078294Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:17:34.4081135Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_attention.py', '--shard-id=2', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:17:34.407854] 2025-12-04T09:26:36.5449821Z 2025-12-04T09:26:36.5450726Z inductor/test_flex_attention 2/6 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_attention_2.6_90be3f66c016358d_.log 2025-12-04T09:26:36.5500255Z Running 135 items in this shard: test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_causal_mask_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_backend_defaults_and_rejects_invalid_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_backend_rejects_legacy_force_use_flag_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_block_mask_non_divisible_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod6_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod7_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_causal_block_non_divisible_with_captured_buffer_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_cpu_error_message_return_lse_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_document_masking_edge_case_mode_aot_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_document_masking_edge_case_mode_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dynamic_shapes_bug_dynamic_batch_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_epilogue_fused_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order0_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order4_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order4_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fully_masked_out_rows_0_check_compile_True_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_function_composition_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kernel_options_argument_is_respected_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_lse_masked_output_backend_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_modular_indexing_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_njt_causal_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_pow_2_headdim_head_dim_24_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_padded_dense_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux__alibi_bias_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_max__rel_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_silu_on_score_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s0_v_s0_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s2_v_s2_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s3_v_s3_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s0_v_s0_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s1_v_s1_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s2_v_s2_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_subgraph_respect_decompostion_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_triton_template_warp_specialization_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod5_cuda_float32, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_attributes_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_operations_with_none_q_indices_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_vs_sequence_lengths_compile_True_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_compiling_create_block_mask_no_recompile_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_pytree_preserves_new_attributes_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_backprop_error_case_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda 2025-12-04T09:26:36.5549097Z 2025-12-04T09:26:36.5549336Z Finished inductor/test_flex_attention 2/6 ... [2025-12-04 09:26:36.544472][2036.472760582], took 9.04min 2025-12-04T09:26:36.5550155Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-f32aa134ae4d7a45.xml 2025-12-04T09:26:36.6280935Z Running inductor/test_cuda_select_algorithm 1/1 ... [2025-12-04 09:26:36.627786][2036.556084059] 2025-12-04T09:26:36.6281455Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:26:36.6284038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_select_algorithm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:26:36.628107] 2025-12-04T10:11:57.3476574Z 2025-12-04T10:11:57.3477472Z PRINTING LOG FILE of inductor/test_cuda_select_algorithm 1/1 (test/test-reports/inductor.test_cuda_select_algorithm_1.1_c5144f504c6801ae_.log) 2025-12-04T10:11:57.3478549Z W1204 09:26:41.492000 34150 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.3479461Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml 2025-12-04T10:11:57.3480321Z ============================= test session starts ============================== 2025-12-04T10:11:57.3480754Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.3481337Z cachedir: .pytest_cache 2025-12-04T10:11:57.3481819Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.3482465Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.3482786Z configfile: pytest.ini 2025-12-04T10:11:57.3483406Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.3484056Z collecting ... collected 58 items 2025-12-04T10:11:57.3484325Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:11:57.3528019Z Running 58 items in this shard: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.3569888Z 2025-12-04T10:11:57.3570806Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0067s] [ 1%] 2025-12-04T10:11:57.3572730Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5930s] [ 1%] 2025-12-04T10:11:57.3574724Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5913s] [ 1%] 2025-12-04T10:11:57.3575407Z 2025-12-04T10:11:57.3575520Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.3576419Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3577248Z Traceback (most recent call last): 2025-12-04T10:11:57.3578032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3578776Z method(*args, **kwargs) 2025-12-04T10:11:57.3579475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3580228Z method(*args, **kwargs) 2025-12-04T10:11:57.3580933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3581685Z with policy(): 2025-12-04T10:11:57.3582377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3583132Z raise RuntimeError(msg) 2025-12-04T10:11:57.3584752Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.3586292Z 2025-12-04T10:11:57.3586521Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3587711Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3588775Z 2025-12-04T10:11:57.3589054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3589686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3590221Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3591100Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3592041Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3592484Z graph_break [] 2025-12-04T10:11:57.3593195Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3593973Z Traceback (most recent call last): 2025-12-04T10:11:57.3594741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3595484Z method(*args, **kwargs) 2025-12-04T10:11:57.3596191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3596930Z method(*args, **kwargs) 2025-12-04T10:11:57.3597665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3598436Z with policy(): 2025-12-04T10:11:57.3599117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3599963Z raise RuntimeError(msg) 2025-12-04T10:11:57.3601712Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.3603264Z 2025-12-04T10:11:57.3603501Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3604835Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3606162Z 2025-12-04T10:11:57.3606437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3607126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3607688Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3608669Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3609698Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3610158Z graph_break [] 2025-12-04T10:11:57.3610571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3611063Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3611496Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3612462Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3613293Z graph_break [] 2025-12-04T10:11:57.3613594Z =================================== FAILURES =================================== 2025-12-04T10:11:57.3614426Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3615209Z Traceback (most recent call last): 2025-12-04T10:11:57.3615980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3616686Z method(*args, **kwargs) 2025-12-04T10:11:57.3617560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3618332Z method(*args, **kwargs) 2025-12-04T10:11:57.3618866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3619478Z with policy(): 2025-12-04T10:11:57.3620154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3620910Z raise RuntimeError(msg) 2025-12-04T10:11:57.3622411Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.3623972Z 2025-12-04T10:11:57.3624217Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3625551Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3626607Z 2025-12-04T10:11:57.3626905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3627596Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3628187Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3628973Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3630012Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3630526Z graph_break [] 2025-12-04T10:11:57.3630866Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3631389Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3632070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3632865Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3633433Z graph_break [] 2025-12-04T10:11:57.3633849Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3634326Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3634810Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3635707Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3636442Z graph_break [] 2025-12-04T10:11:57.3637372Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml - 2025-12-04T10:11:57.3638081Z =========================== short test summary info ============================ 2025-12-04T10:11:57.3640174Z FAILED [0.5913s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.3642598Z 2025-12-04T10:11:57.3642831Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3644124Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3645216Z 2025-12-04T10:11:57.3645521Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3646150Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.3646644Z ========================== 1 failed, 2 rerun in 3.22s ========================== 2025-12-04T10:11:57.3647057Z Got exit code 1 2025-12-04T10:11:57.3647337Z Retrying single test... 2025-12-04T10:11:57.3647933Z W1204 09:26:51.254000 34332 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.3649232Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml 2025-12-04T10:11:57.3650137Z ============================= test session starts ============================== 2025-12-04T10:11:57.3650817Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.3651410Z cachedir: .pytest_cache 2025-12-04T10:11:57.3652138Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.3652974Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.3653351Z configfile: pytest.ini 2025-12-04T10:11:57.3654054Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.3655004Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.3656384Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3657584Z Running 1 items in this shard 2025-12-04T10:11:57.3657823Z 2025-12-04T10:11:57.3659314Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:26:52.365432592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3660850Z 2025-12-04T10:11:57.3661320Z [W1204 09:27:01.454677478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3661971Z 2025-12-04T10:11:57.3662533Z [W1204 09:27:01.454996623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3663207Z 2025-12-04T10:11:57.3663639Z [W1204 09:27:01.455565703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3664306Z 2025-12-04T10:11:57.3664691Z [W1204 09:27:01.455757606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3665332Z 2025-12-04T10:11:57.3665851Z [W1204 09:27:01.456962397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3666445Z 2025-12-04T10:11:57.3666777Z [W1204 09:27:01.457115099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3667384Z 2025-12-04T10:11:57.3667928Z [W1204 09:27:01.457394864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3668605Z 2025-12-04T10:11:57.3669166Z [W1204 09:27:01.457569317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3669770Z 2025-12-04T10:11:57.3670301Z [W1204 09:27:01.466029442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3670933Z 2025-12-04T10:11:57.3671278Z [W1204 09:27:01.466230516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3671992Z 2025-12-04T10:11:57.3672426Z [W1204 09:27:01.466403159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3673078Z 2025-12-04T10:11:57.3673623Z [W1204 09:27:01.466631803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3674263Z 2025-12-04T10:11:57.3674794Z [W1204 09:27:01.466777735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3675475Z 2025-12-04T10:11:57.3675908Z [W1204 09:27:01.467027499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3676538Z 2025-12-04T10:11:57.3677079Z [W1204 09:27:01.467163982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3677762Z 2025-12-04T10:11:57.3678245Z [W1204 09:27:01.467394526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3678915Z 2025-12-04T10:11:57.3679377Z [W1204 09:27:01.467531328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3680138Z 2025-12-04T10:11:57.3680819Z [W1204 09:27:01.554995499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3681537Z 2025-12-04T10:11:57.3682025Z [W1204 09:27:01.555211613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3682768Z 2025-12-04T10:11:57.3683159Z [W1204 09:27:01.555362955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3683755Z 2025-12-04T10:11:57.3684280Z [W1204 09:27:01.555567449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3684968Z 2025-12-04T10:11:57.3685491Z [W1204 09:27:01.555694131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3686202Z 2025-12-04T10:11:57.3686762Z [W1204 09:27:01.555908245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3687448Z 2025-12-04T10:11:57.3688003Z [W1204 09:27:01.556031197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3688675Z 2025-12-04T10:11:57.3689189Z [W1204 09:27:01.556237141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3689859Z 2025-12-04T10:11:57.3690372Z [W1204 09:27:01.556366093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3691079Z 2025-12-04T10:11:57.3691237Z ('RERUN', {'yellow': True}) [11.1055s] [100%] 2025-12-04T10:11:57.3692856Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:02.785652369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3694375Z 2025-12-04T10:11:57.3694934Z [W1204 09:27:02.785904843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3695592Z 2025-12-04T10:11:57.3696087Z [W1204 09:27:02.786058606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3696637Z 2025-12-04T10:11:57.3697146Z [W1204 09:27:02.786267639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3697799Z 2025-12-04T10:11:57.3698311Z [W1204 09:27:02.786397221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3698967Z 2025-12-04T10:11:57.3699409Z [W1204 09:27:02.786614455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3700113Z 2025-12-04T10:11:57.3700644Z [W1204 09:27:02.786736497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3701198Z 2025-12-04T10:11:57.3701761Z [W1204 09:27:02.786938841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3702256Z 2025-12-04T10:11:57.3702772Z [W1204 09:27:02.787065343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3703415Z 2025-12-04T10:11:57.3703956Z [W1204 09:27:02.793157197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3704660Z 2025-12-04T10:11:57.3705354Z [W1204 09:27:02.793335531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3706037Z 2025-12-04T10:11:57.3706589Z [W1204 09:27:02.793490313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3707214Z 2025-12-04T10:11:57.3707754Z [W1204 09:27:02.793696437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3708597Z 2025-12-04T10:11:57.3709135Z [W1204 09:27:02.793823239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3709815Z 2025-12-04T10:11:57.3710228Z [W1204 09:27:02.794059723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3710815Z 2025-12-04T10:11:57.3711342Z [W1204 09:27:02.794195405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3712056Z 2025-12-04T10:11:57.3712619Z [W1204 09:27:02.794403459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3713321Z 2025-12-04T10:11:57.3713809Z [W1204 09:27:02.794525251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3714462Z 2025-12-04T10:11:57.3715011Z [W1204 09:27:02.876171272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3715554Z 2025-12-04T10:11:57.3716115Z [W1204 09:27:02.876396526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3716778Z 2025-12-04T10:11:57.3717479Z [W1204 09:27:02.876548479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3718202Z 2025-12-04T10:11:57.3718748Z [W1204 09:27:02.876754152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3719417Z 2025-12-04T10:11:57.3719787Z [W1204 09:27:02.876879174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3720495Z 2025-12-04T10:11:57.3721023Z [W1204 09:27:02.877092258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3721664Z 2025-12-04T10:11:57.3722224Z [W1204 09:27:02.877214230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3722907Z 2025-12-04T10:11:57.3723448Z [W1204 09:27:02.877415363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3724125Z 2025-12-04T10:11:57.3724660Z [W1204 09:27:02.877535425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3725310Z 2025-12-04T10:11:57.3725463Z ('RERUN', {'yellow': True}) [0.5560s] [100%] 2025-12-04T10:11:57.3727004Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:03.339418552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3728393Z 2025-12-04T10:11:57.3728942Z [W1204 09:27:03.339633386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3729634Z 2025-12-04T10:11:57.3730191Z [W1204 09:27:03.339783629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3730742Z 2025-12-04T10:11:57.3731497Z [W1204 09:27:03.339995102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3732207Z 2025-12-04T10:11:57.3732757Z [W1204 09:27:03.340143785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3733491Z 2025-12-04T10:11:57.3734045Z [W1204 09:27:03.340381999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3734697Z 2025-12-04T10:11:57.3735020Z [W1204 09:27:03.340513921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3735706Z 2025-12-04T10:11:57.3736198Z [W1204 09:27:03.340720045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3736811Z 2025-12-04T10:11:57.3737358Z [W1204 09:27:03.340844117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3738030Z 2025-12-04T10:11:57.3738565Z [W1204 09:27:03.346733597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3739266Z 2025-12-04T10:11:57.3739817Z [W1204 09:27:03.346902610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3740438Z 2025-12-04T10:11:57.3740982Z [W1204 09:27:03.347053573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3741638Z 2025-12-04T10:11:57.3742154Z [W1204 09:27:03.347256656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3742841Z 2025-12-04T10:11:57.3743375Z [W1204 09:27:03.347382588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3744050Z 2025-12-04T10:11:57.3744569Z [W1204 09:27:03.347598712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3745265Z 2025-12-04T10:11:57.3745799Z [W1204 09:27:03.347727074 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3746461Z 2025-12-04T10:11:57.3746932Z [W1204 09:27:03.347931418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3747576Z 2025-12-04T10:11:57.3748102Z [W1204 09:27:03.348057610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3748726Z 2025-12-04T10:11:57.3749278Z [W1204 09:27:03.429108151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3749981Z 2025-12-04T10:11:57.3750510Z [W1204 09:27:03.429291695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3751184Z 2025-12-04T10:11:57.3751731Z [W1204 09:27:03.429441157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3752401Z 2025-12-04T10:11:57.3752915Z [W1204 09:27:03.429645651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3753559Z 2025-12-04T10:11:57.3754055Z [W1204 09:27:03.429770363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3754697Z 2025-12-04T10:11:57.3755342Z [W1204 09:27:03.429984577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3759553Z 2025-12-04T10:11:57.3760182Z [W1204 09:27:03.430126819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3760844Z 2025-12-04T10:11:57.3761502Z [W1204 09:27:03.430348153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3798108Z 2025-12-04T10:11:57.3798693Z [W1204 09:27:03.430469925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3799340Z 2025-12-04T10:11:57.3799446Z FAILED [0.5523s] [100%] 2025-12-04T10:11:57.3799626Z 2025-12-04T10:11:57.3799773Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.3800658Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3801429Z Traceback (most recent call last): 2025-12-04T10:11:57.3802145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3802860Z method(*args, **kwargs) 2025-12-04T10:11:57.3803529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3804213Z method(*args, **kwargs) 2025-12-04T10:11:57.3804862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3805534Z with policy(): 2025-12-04T10:11:57.3806192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3806883Z raise RuntimeError(msg) 2025-12-04T10:11:57.3808403Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.3809786Z 2025-12-04T10:11:57.3809995Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3811168Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3812112Z 2025-12-04T10:11:57.3812378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3812982Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3813495Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3814399Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3815321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3815775Z graph_break [] 2025-12-04T10:11:57.3816156Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.3817779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.3819108Z if out == self.unknown_value: 2025-12-04T10:11:57.3819583Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3820384Z Traceback (most recent call last): 2025-12-04T10:11:57.3821292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3822122Z method(*args, **kwargs) 2025-12-04T10:11:57.3822920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3823880Z method(*args, **kwargs) 2025-12-04T10:11:57.3824585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3825337Z with policy(): 2025-12-04T10:11:57.3825817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3826639Z raise RuntimeError(msg) 2025-12-04T10:11:57.3828322Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.3829437Z 2025-12-04T10:11:57.3829579Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3830328Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3830946Z 2025-12-04T10:11:57.3831111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3831488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3831802Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3832338Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3832914Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3833190Z graph_break [] 2025-12-04T10:11:57.3833410Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.3834344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.3835201Z if out == self.unknown_value: 2025-12-04T10:11:57.3835456Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3835761Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3836059Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3836620Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3837116Z graph_break [] 2025-12-04T10:11:57.3837292Z =================================== FAILURES =================================== 2025-12-04T10:11:57.3837775Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3838242Z Traceback (most recent call last): 2025-12-04T10:11:57.3838691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3839138Z method(*args, **kwargs) 2025-12-04T10:11:57.3839553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3840107Z method(*args, **kwargs) 2025-12-04T10:11:57.3840518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3840960Z with policy(): 2025-12-04T10:11:57.3841464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3841905Z raise RuntimeError(msg) 2025-12-04T10:11:57.3842870Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.3843858Z 2025-12-04T10:11:57.3844088Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3844935Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3845547Z 2025-12-04T10:11:57.3845718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3846091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3846414Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3846940Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3847497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3847765Z graph_break [] 2025-12-04T10:11:57.3847991Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.3848912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.3849751Z if out == self.unknown_value: 2025-12-04T10:11:57.3850004Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3850304Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3850592Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3851150Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3851631Z graph_break [] 2025-12-04T10:11:57.3851842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3852135Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3852425Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3852969Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3853454Z graph_break [] 2025-12-04T10:11:57.3854039Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml - 2025-12-04T10:11:57.3854704Z =========================== short test summary info ============================ 2025-12-04T10:11:57.3856219Z FAILED [0.5523s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.3857600Z 2025-12-04T10:11:57.3857822Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3858566Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3859242Z 2025-12-04T10:11:57.3859403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3859748Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.3860052Z ================== 1 failed, 57 deselected, 2 rerun in 12.24s ================== 2025-12-04T10:11:57.3860305Z Got exit code 1 2025-12-04T10:11:57.3860461Z Retrying single test... 2025-12-04T10:11:57.3860835Z W1204 09:27:10.082000 34519 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.3861572Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml 2025-12-04T10:11:57.3862130Z ============================= test session starts ============================== 2025-12-04T10:11:57.3862516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.3862869Z cachedir: .pytest_cache 2025-12-04T10:11:57.3863283Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.3863735Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.3863942Z configfile: pytest.ini 2025-12-04T10:11:57.3864368Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.3864888Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.3865685Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3866415Z Running 1 items in this shard 2025-12-04T10:11:57.3866545Z 2025-12-04T10:11:57.3867299Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:11.193227255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3868118Z 2025-12-04T10:11:57.3868424Z [W1204 09:27:20.420740292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3868800Z 2025-12-04T10:11:57.3869092Z [W1204 09:27:20.421006256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3869465Z 2025-12-04T10:11:57.3869755Z [W1204 09:27:20.421586837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3870126Z 2025-12-04T10:11:57.3870426Z [W1204 09:27:20.421775430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3870801Z 2025-12-04T10:11:57.3871089Z [W1204 09:27:20.423026321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3871457Z 2025-12-04T10:11:57.3871749Z [W1204 09:27:20.423192094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3872116Z 2025-12-04T10:11:57.3872408Z [W1204 09:27:20.423477369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3872773Z 2025-12-04T10:11:57.3873137Z [W1204 09:27:20.423658082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3873509Z 2025-12-04T10:11:57.3873797Z [W1204 09:27:20.432140088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3874166Z 2025-12-04T10:11:57.3874543Z [W1204 09:27:20.432354782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3874909Z 2025-12-04T10:11:57.3875215Z [W1204 09:27:20.432534105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3875579Z 2025-12-04T10:11:57.3875869Z [W1204 09:27:20.432772829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3876234Z 2025-12-04T10:11:57.3876527Z [W1204 09:27:20.432924802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3876895Z 2025-12-04T10:11:57.3877182Z [W1204 09:27:20.433172366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3877549Z 2025-12-04T10:11:57.3877834Z [W1204 09:27:20.433312759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3878203Z 2025-12-04T10:11:57.3878494Z [W1204 09:27:20.433544363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3878859Z 2025-12-04T10:11:57.3879147Z [W1204 09:27:20.433683205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3879513Z 2025-12-04T10:11:57.3879800Z [W1204 09:27:20.522317756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3880278Z 2025-12-04T10:11:57.3880577Z [W1204 09:27:20.522539470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3880947Z 2025-12-04T10:11:57.3881237Z [W1204 09:27:20.522693582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3881609Z 2025-12-04T10:11:57.3881899Z [W1204 09:27:20.522905526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3882267Z 2025-12-04T10:11:57.3882554Z [W1204 09:27:20.523031618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3882930Z 2025-12-04T10:11:57.3883221Z [W1204 09:27:20.523248462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3883595Z 2025-12-04T10:11:57.3883886Z [W1204 09:27:20.523373044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3884252Z 2025-12-04T10:11:57.3884547Z [W1204 09:27:20.523583258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3884920Z 2025-12-04T10:11:57.3885214Z [W1204 09:27:20.523704070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3885578Z 2025-12-04T10:11:57.3885662Z ('RERUN', {'yellow': True}) [11.2481s] [100%] 2025-12-04T10:11:57.3886576Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:21.758525591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3887391Z 2025-12-04T10:11:57.3887760Z [W1204 09:27:21.758776755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3888139Z 2025-12-04T10:11:57.3888429Z [W1204 09:27:21.758925718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3888932Z 2025-12-04T10:11:57.3889224Z [W1204 09:27:21.759141312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3889591Z 2025-12-04T10:11:57.3889878Z [W1204 09:27:21.759267554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3890247Z 2025-12-04T10:11:57.3890535Z [W1204 09:27:21.759485057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3890904Z 2025-12-04T10:11:57.3891196Z [W1204 09:27:21.759612620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3891564Z 2025-12-04T10:11:57.3891856Z [W1204 09:27:21.759819493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3892223Z 2025-12-04T10:11:57.3892519Z [W1204 09:27:21.759947325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3892887Z 2025-12-04T10:11:57.3893194Z [W1204 09:27:21.766013180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3893572Z 2025-12-04T10:11:57.3893864Z [W1204 09:27:21.766190793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3894240Z 2025-12-04T10:11:57.3894532Z [W1204 09:27:21.766339315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3894898Z 2025-12-04T10:11:57.3895193Z [W1204 09:27:21.766539979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3895561Z 2025-12-04T10:11:57.3895856Z [W1204 09:27:21.766670551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3896226Z 2025-12-04T10:11:57.3896514Z [W1204 09:27:21.766884535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3896885Z 2025-12-04T10:11:57.3897175Z [W1204 09:27:21.767005147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3897551Z 2025-12-04T10:11:57.3897842Z [W1204 09:27:21.767208430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3898220Z 2025-12-04T10:11:57.3898518Z [W1204 09:27:21.767327962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3898888Z 2025-12-04T10:11:57.3899184Z [W1204 09:27:21.851271173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3899551Z 2025-12-04T10:11:57.3899840Z [W1204 09:27:21.851492876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3900214Z 2025-12-04T10:11:57.3900504Z [W1204 09:27:21.851644199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3900877Z 2025-12-04T10:11:57.3901240Z [W1204 09:27:21.851849223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3901617Z 2025-12-04T10:11:57.3901914Z [W1204 09:27:21.851974795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3902353Z 2025-12-04T10:11:57.3902650Z [W1204 09:27:21.852190648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3903018Z 2025-12-04T10:11:57.3903313Z [W1204 09:27:21.852325791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3903680Z 2025-12-04T10:11:57.3903983Z [W1204 09:27:21.852536264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3904358Z 2025-12-04T10:11:57.3904653Z [W1204 09:27:21.852660006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3905026Z 2025-12-04T10:11:57.3905109Z ('RERUN', {'yellow': True}) [0.5634s] [100%] 2025-12-04T10:11:57.3906005Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:22.320709798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3906823Z 2025-12-04T10:11:57.3907124Z [W1204 09:27:22.320927482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3907493Z 2025-12-04T10:11:57.3907783Z [W1204 09:27:22.321077785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3908159Z 2025-12-04T10:11:57.3908451Z [W1204 09:27:22.321285018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3908826Z 2025-12-04T10:11:57.3909116Z [W1204 09:27:22.321410630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3909484Z 2025-12-04T10:11:57.3909782Z [W1204 09:27:22.321630174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3910149Z 2025-12-04T10:11:57.3910446Z [W1204 09:27:22.321755207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3910813Z 2025-12-04T10:11:57.3911105Z [W1204 09:27:22.321962870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3911482Z 2025-12-04T10:11:57.3911772Z [W1204 09:27:22.322093732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3912149Z 2025-12-04T10:11:57.3912441Z [W1204 09:27:22.328172587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3912808Z 2025-12-04T10:11:57.3913107Z [W1204 09:27:22.328354230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3913485Z 2025-12-04T10:11:57.3913789Z [W1204 09:27:22.328511113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3914158Z 2025-12-04T10:11:57.3914450Z [W1204 09:27:22.328738527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3914826Z 2025-12-04T10:11:57.3915117Z [W1204 09:27:22.328863869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3915489Z 2025-12-04T10:11:57.3915868Z [W1204 09:27:22.329092213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3916240Z 2025-12-04T10:11:57.3916538Z [W1204 09:27:22.329218635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3916973Z 2025-12-04T10:11:57.3917579Z [W1204 09:27:22.329425059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3917967Z 2025-12-04T10:11:57.3918261Z [W1204 09:27:22.329548061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3918639Z 2025-12-04T10:11:57.3918928Z [W1204 09:27:22.413363729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3919314Z 2025-12-04T10:11:57.3919609Z [W1204 09:27:22.413550912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3920053Z 2025-12-04T10:11:57.3920346Z [W1204 09:27:22.413699245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3920718Z 2025-12-04T10:11:57.3921013Z [W1204 09:27:22.413903948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3921382Z 2025-12-04T10:11:57.3921669Z [W1204 09:27:22.414033670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3922042Z 2025-12-04T10:11:57.3922329Z [W1204 09:27:22.414246694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3922702Z 2025-12-04T10:11:57.3922996Z [W1204 09:27:22.414373556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3923367Z 2025-12-04T10:11:57.3923657Z [W1204 09:27:22.414579140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3924031Z 2025-12-04T10:11:57.3924324Z [W1204 09:27:22.414701352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.3924692Z 2025-12-04T10:11:57.3924764Z FAILED [0.5630s] [100%] 2025-12-04T10:11:57.3924875Z 2025-12-04T10:11:57.3924968Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.3925456Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3925935Z Traceback (most recent call last): 2025-12-04T10:11:57.3926402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3926854Z method(*args, **kwargs) 2025-12-04T10:11:57.3927279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3927725Z method(*args, **kwargs) 2025-12-04T10:11:57.3928132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3928577Z with policy(): 2025-12-04T10:11:57.3928983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3929425Z raise RuntimeError(msg) 2025-12-04T10:11:57.3930518Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.3931415Z 2025-12-04T10:11:57.3931549Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3932312Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3933030Z 2025-12-04T10:11:57.3933200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3933572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3933890Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3934426Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3934996Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3935267Z graph_break [] 2025-12-04T10:11:57.3935492Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.3936415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.3937267Z if out == self.unknown_value: 2025-12-04T10:11:57.3937718Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3938182Z Traceback (most recent call last): 2025-12-04T10:11:57.3938636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3939075Z method(*args, **kwargs) 2025-12-04T10:11:57.3939515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3939959Z method(*args, **kwargs) 2025-12-04T10:11:57.3940379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3940816Z with policy(): 2025-12-04T10:11:57.3941220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3941681Z raise RuntimeError(msg) 2025-12-04T10:11:57.3942632Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.3943541Z 2025-12-04T10:11:57.3943674Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3944433Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3945056Z 2025-12-04T10:11:57.3945218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3945593Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3945899Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3946430Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3946987Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3947255Z graph_break [] 2025-12-04T10:11:57.3947579Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.3948675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.3949769Z if out == self.unknown_value: 2025-12-04T10:11:57.3950023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3950321Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3950618Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3951177Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3951653Z graph_break [] 2025-12-04T10:11:57.3951835Z =================================== FAILURES =================================== 2025-12-04T10:11:57.3952327Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3952793Z Traceback (most recent call last): 2025-12-04T10:11:57.3953244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3953695Z method(*args, **kwargs) 2025-12-04T10:11:57.3954112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3954548Z method(*args, **kwargs) 2025-12-04T10:11:57.3954965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3955399Z with policy(): 2025-12-04T10:11:57.3955802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3956241Z raise RuntimeError(msg) 2025-12-04T10:11:57.3957190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.3958110Z 2025-12-04T10:11:57.3958241Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3958993Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3959603Z 2025-12-04T10:11:57.3959769Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3960213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3960523Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3961050Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3961599Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3961868Z graph_break [] 2025-12-04T10:11:57.3962090Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.3963003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.3963839Z if out == self.unknown_value: 2025-12-04T10:11:57.3964098Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3964480Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3964784Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3965339Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3965912Z graph_break [] 2025-12-04T10:11:57.3966136Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3966433Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3966730Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3967286Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3967770Z graph_break [] 2025-12-04T10:11:57.3968352Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml - 2025-12-04T10:11:57.3969021Z =========================== short test summary info ============================ 2025-12-04T10:11:57.3970542Z FAILED [0.5630s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.3971927Z 2025-12-04T10:11:57.3972055Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3972806Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3973415Z 2025-12-04T10:11:57.3973577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3973923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.3974229Z ================== 1 failed, 57 deselected, 2 rerun in 12.40s ================== 2025-12-04T10:11:57.3974490Z Got exit code 1 2025-12-04T10:11:57.3975071Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3975876Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.3976463Z W1204 09:27:29.093000 34706 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.3977184Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml 2025-12-04T10:11:57.3977738Z ============================= test session starts ============================== 2025-12-04T10:11:57.3978141Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.3978510Z cachedir: .pytest_cache 2025-12-04T10:11:57.3978928Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.3979383Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.3979596Z configfile: pytest.ini 2025-12-04T10:11:57.3980025Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.3980616Z collecting ... collected 58 items / 1 deselected / 57 selected 2025-12-04T10:11:57.3980910Z stepcurrent: skipping 1 already run items. 2025-12-04T10:11:57.3981138Z Running 57 items in this shard 2025-12-04T10:11:57.3981261Z 2025-12-04T10:11:57.3981782Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9997s] [ 1%] 2025-12-04T10:11:57.3982936Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6045s] [ 1%] 2025-12-04T10:11:57.3983978Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5944s] [ 1%] 2025-12-04T10:11:57.3984520Z 2025-12-04T10:11:57.3984608Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.3985090Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3985546Z Traceback (most recent call last): 2025-12-04T10:11:57.3986002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3986454Z method(*args, **kwargs) 2025-12-04T10:11:57.3986876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3987311Z method(*args, **kwargs) 2025-12-04T10:11:57.3987725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3988168Z with policy(): 2025-12-04T10:11:57.3988569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3989010Z raise RuntimeError(msg) 2025-12-04T10:11:57.3989957Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.3990855Z 2025-12-04T10:11:57.3990996Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.3991748Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.3992355Z 2025-12-04T10:11:57.3992515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.3992885Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.3993197Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.3993728Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.3994290Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.3994555Z graph_break [] 2025-12-04T10:11:57.3994957Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.3995413Z Traceback (most recent call last): 2025-12-04T10:11:57.3995860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3996295Z method(*args, **kwargs) 2025-12-04T10:11:57.3996706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.3997234Z method(*args, **kwargs) 2025-12-04T10:11:57.3997644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.3998074Z with policy(): 2025-12-04T10:11:57.3998467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.3998975Z raise RuntimeError(msg) 2025-12-04T10:11:57.4000005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4000904Z 2025-12-04T10:11:57.4001035Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4001776Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4002382Z 2025-12-04T10:11:57.4002539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4002906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4003208Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4003735Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4004281Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4004545Z graph_break [] 2025-12-04T10:11:57.4004757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4005058Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4005347Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4005902Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4006385Z graph_break [] 2025-12-04T10:11:57.4006552Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4007029Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4007484Z Traceback (most recent call last): 2025-12-04T10:11:57.4007935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4008376Z method(*args, **kwargs) 2025-12-04T10:11:57.4008798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4009244Z method(*args, **kwargs) 2025-12-04T10:11:57.4009647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4010080Z with policy(): 2025-12-04T10:11:57.4010475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4010917Z raise RuntimeError(msg) 2025-12-04T10:11:57.4011866Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4012771Z 2025-12-04T10:11:57.4012899Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4013714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4014329Z 2025-12-04T10:11:57.4014494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4014922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4015217Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4015740Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4016290Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4016550Z graph_break [] 2025-12-04T10:11:57.4016772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4017299Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4017604Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4018157Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4018647Z graph_break [] 2025-12-04T10:11:57.4018865Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4019163Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4019456Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4019998Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4020480Z graph_break [] 2025-12-04T10:11:57.4021064Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml - 2025-12-04T10:11:57.4021727Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4023231Z FAILED [0.5944s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4024614Z 2025-12-04T10:11:57.4024744Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4025488Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4026097Z 2025-12-04T10:11:57.4026259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4026597Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4026917Z =================== 1 failed, 1 deselected, 2 rerun in 3.22s =================== 2025-12-04T10:11:57.4027167Z Got exit code 1 2025-12-04T10:11:57.4027320Z Retrying single test... 2025-12-04T10:11:57.4027699Z W1204 09:27:38.843000 34888 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4028423Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml 2025-12-04T10:11:57.4028985Z ============================= test session starts ============================== 2025-12-04T10:11:57.4029486Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4029839Z cachedir: .pytest_cache 2025-12-04T10:11:57.4030252Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4030803Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4031012Z configfile: pytest.ini 2025-12-04T10:11:57.4031443Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4031962Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4032751Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4033475Z Running 1 items in this shard 2025-12-04T10:11:57.4033604Z 2025-12-04T10:11:57.4034473Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:40.949417502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4035292Z 2025-12-04T10:11:57.4035610Z [W1204 09:27:49.069140543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4035988Z 2025-12-04T10:11:57.4036290Z [W1204 09:27:49.069422077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4036656Z 2025-12-04T10:11:57.4036946Z [W1204 09:27:49.069988306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4037321Z 2025-12-04T10:11:57.4037656Z [W1204 09:27:49.070224220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4038096Z 2025-12-04T10:11:57.4038438Z [W1204 09:27:49.071513610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4038933Z 2025-12-04T10:11:57.4039283Z [W1204 09:27:49.071711683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4039719Z 2025-12-04T10:11:57.4040064Z [W1204 09:27:49.072040268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4040436Z 2025-12-04T10:11:57.4040726Z [W1204 09:27:49.072230831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4041099Z 2025-12-04T10:11:57.4041390Z [W1204 09:27:49.080743223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4041762Z 2025-12-04T10:11:57.4042054Z [W1204 09:27:49.080965406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4042424Z 2025-12-04T10:11:57.4042720Z [W1204 09:27:49.081135199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4043086Z 2025-12-04T10:11:57.4043379Z [W1204 09:27:49.081368253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4043745Z 2025-12-04T10:11:57.4044037Z [W1204 09:27:49.081505155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4044409Z 2025-12-04T10:11:57.4044777Z [W1204 09:27:49.081753389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4045151Z 2025-12-04T10:11:57.4045439Z [W1204 09:27:49.081906311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4045810Z 2025-12-04T10:11:57.4046188Z [W1204 09:27:49.082142505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4046556Z 2025-12-04T10:11:57.4046849Z [W1204 09:27:49.082280397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4047221Z 2025-12-04T10:11:57.4047508Z [W1204 09:27:49.170709500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4047879Z 2025-12-04T10:11:57.4048170Z [W1204 09:27:49.170927173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4048543Z 2025-12-04T10:11:57.4048831Z [W1204 09:27:49.171078615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4049204Z 2025-12-04T10:11:57.4049491Z [W1204 09:27:49.171290079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4049859Z 2025-12-04T10:11:57.4050152Z [W1204 09:27:49.171413541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4050520Z 2025-12-04T10:11:57.4050809Z [W1204 09:27:49.171630904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4051187Z 2025-12-04T10:11:57.4051479Z [W1204 09:27:49.171758116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4051854Z 2025-12-04T10:11:57.4052148Z [W1204 09:27:49.171961039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4052518Z 2025-12-04T10:11:57.4052807Z [W1204 09:27:49.172081811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4053179Z 2025-12-04T10:11:57.4053268Z ('RERUN', {'yellow': True}) [11.1332s] [100%] 2025-12-04T10:11:57.4054204Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:50.402986024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4055018Z 2025-12-04T10:11:57.4055311Z [W1204 09:27:50.403245848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4055687Z 2025-12-04T10:11:57.4055977Z [W1204 09:27:50.403400511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4056348Z 2025-12-04T10:11:57.4056637Z [W1204 09:27:50.403614084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4057005Z 2025-12-04T10:11:57.4057299Z [W1204 09:27:50.403737906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4057674Z 2025-12-04T10:11:57.4057977Z [W1204 09:27:50.403958079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4058346Z 2025-12-04T10:11:57.4058635Z [W1204 09:27:50.404083711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4059005Z 2025-12-04T10:11:57.4059366Z [W1204 09:27:50.404287934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4059741Z 2025-12-04T10:11:57.4060029Z [W1204 09:27:50.404420296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4060462Z 2025-12-04T10:11:57.4060757Z [W1204 09:27:50.410670873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4061126Z 2025-12-04T10:11:57.4061419Z [W1204 09:27:50.410844486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4061796Z 2025-12-04T10:11:57.4062088Z [W1204 09:27:50.410994738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4062457Z 2025-12-04T10:11:57.4062750Z [W1204 09:27:50.411199401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4063121Z 2025-12-04T10:11:57.4063411Z [W1204 09:27:50.411329203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4063781Z 2025-12-04T10:11:57.4064077Z [W1204 09:27:50.411546347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4064443Z 2025-12-04T10:11:57.4064736Z [W1204 09:27:50.411676109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4065107Z 2025-12-04T10:11:57.4065397Z [W1204 09:27:50.411878402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4065768Z 2025-12-04T10:11:57.4066064Z [W1204 09:27:50.412003374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4066444Z 2025-12-04T10:11:57.4066736Z [W1204 09:27:50.496256071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4067111Z 2025-12-04T10:11:57.4067403Z [W1204 09:27:50.496491995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4067770Z 2025-12-04T10:11:57.4068074Z [W1204 09:27:50.496643717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4068447Z 2025-12-04T10:11:57.4068742Z [W1204 09:27:50.496850880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4069112Z 2025-12-04T10:11:57.4069405Z [W1204 09:27:50.496978342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4069776Z 2025-12-04T10:11:57.4070070Z [W1204 09:27:50.497195946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4070449Z 2025-12-04T10:11:57.4070738Z [W1204 09:27:50.497319357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4071106Z 2025-12-04T10:11:57.4071401Z [W1204 09:27:50.497521811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4071764Z 2025-12-04T10:11:57.4072058Z [W1204 09:27:50.497641202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4072425Z 2025-12-04T10:11:57.4072504Z ('RERUN', {'yellow': True}) [0.5637s] [100%] 2025-12-04T10:11:57.4073468Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:51.966659812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4074360Z 2025-12-04T10:11:57.4074658Z [W1204 09:27:51.966875585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4075025Z 2025-12-04T10:11:57.4075323Z [W1204 09:27:51.967030348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4075690Z 2025-12-04T10:11:57.4075985Z [W1204 09:27:51.967240971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4076352Z 2025-12-04T10:11:57.4076642Z [W1204 09:27:51.967366443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4077014Z 2025-12-04T10:11:57.4077302Z [W1204 09:27:51.967583856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4077681Z 2025-12-04T10:11:57.4077969Z [W1204 09:27:51.967706648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4078334Z 2025-12-04T10:11:57.4078631Z [W1204 09:27:51.967912961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4078997Z 2025-12-04T10:11:57.4079286Z [W1204 09:27:51.968033353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4079653Z 2025-12-04T10:11:57.4080011Z [W1204 09:27:51.974189459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4080383Z 2025-12-04T10:11:57.4080672Z [W1204 09:27:51.974365261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4081044Z 2025-12-04T10:11:57.4081337Z [W1204 09:27:51.974516304 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4081707Z 2025-12-04T10:11:57.4081997Z [W1204 09:27:51.974718647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4082363Z 2025-12-04T10:11:57.4082656Z [W1204 09:27:51.974841449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4083024Z 2025-12-04T10:11:57.4083314Z [W1204 09:27:51.975055812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4083687Z 2025-12-04T10:11:57.4083980Z [W1204 09:27:51.975178864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4084350Z 2025-12-04T10:11:57.4084637Z [W1204 09:27:51.975395848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4085012Z 2025-12-04T10:11:57.4085304Z [W1204 09:27:51.975515579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4085672Z 2025-12-04T10:11:57.4085963Z [W1204 09:27:51.059682155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4086332Z 2025-12-04T10:11:57.4086627Z [W1204 09:27:51.059867718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4086991Z 2025-12-04T10:11:57.4087359Z [W1204 09:27:51.060038190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4087742Z 2025-12-04T10:11:57.4088034Z [W1204 09:27:51.060254184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4088480Z 2025-12-04T10:11:57.4088768Z [W1204 09:27:51.060391086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4089138Z 2025-12-04T10:11:57.4089433Z [W1204 09:27:51.060606109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4089799Z 2025-12-04T10:11:57.4090092Z [W1204 09:27:51.060728991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4090461Z 2025-12-04T10:11:57.4090754Z [W1204 09:27:51.060927724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4091126Z 2025-12-04T10:11:57.4091412Z [W1204 09:27:51.061046146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4091784Z 2025-12-04T10:11:57.4091845Z FAILED [0.5588s] [100%] 2025-12-04T10:11:57.4091950Z 2025-12-04T10:11:57.4092041Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4092516Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4092977Z Traceback (most recent call last): 2025-12-04T10:11:57.4093436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4093884Z method(*args, **kwargs) 2025-12-04T10:11:57.4094297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4094734Z method(*args, **kwargs) 2025-12-04T10:11:57.4095138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4095568Z with policy(): 2025-12-04T10:11:57.4095967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4096412Z raise RuntimeError(msg) 2025-12-04T10:11:57.4097362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4098252Z 2025-12-04T10:11:57.4098387Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4099133Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4099753Z 2025-12-04T10:11:57.4099915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4100293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4100603Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4101130Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4101682Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4101947Z graph_break [] 2025-12-04T10:11:57.4102244Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4103161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4104081Z if out == self.unknown_value: 2025-12-04T10:11:57.4104513Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4104970Z Traceback (most recent call last): 2025-12-04T10:11:57.4105415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4105858Z method(*args, **kwargs) 2025-12-04T10:11:57.4106268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4106702Z method(*args, **kwargs) 2025-12-04T10:11:57.4107106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4107534Z with policy(): 2025-12-04T10:11:57.4107931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4108377Z raise RuntimeError(msg) 2025-12-04T10:11:57.4109319Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4110214Z 2025-12-04T10:11:57.4110349Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4111107Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4111719Z 2025-12-04T10:11:57.4111877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4112241Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4112543Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4113067Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4113617Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4113882Z graph_break [] 2025-12-04T10:11:57.4114099Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4114998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4115839Z if out == self.unknown_value: 2025-12-04T10:11:57.4116094Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4116392Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4116680Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4117557Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4118050Z graph_break [] 2025-12-04T10:11:57.4118229Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4118833Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4119297Z Traceback (most recent call last): 2025-12-04T10:11:57.4119747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4120332Z method(*args, **kwargs) 2025-12-04T10:11:57.4120745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4121184Z method(*args, **kwargs) 2025-12-04T10:11:57.4121593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4122024Z with policy(): 2025-12-04T10:11:57.4122417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4122862Z raise RuntimeError(msg) 2025-12-04T10:11:57.4123808Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4124710Z 2025-12-04T10:11:57.4124838Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4125607Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4126232Z 2025-12-04T10:11:57.4126391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4126760Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4127066Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4127605Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4128158Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4128435Z graph_break [] 2025-12-04T10:11:57.4128661Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4129562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4130398Z if out == self.unknown_value: 2025-12-04T10:11:57.4130648Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4130946Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4131241Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4131796Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4132274Z graph_break [] 2025-12-04T10:11:57.4132494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4132794Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4133083Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4133625Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4134122Z graph_break [] 2025-12-04T10:11:57.4134712Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml - 2025-12-04T10:11:57.4135526Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4137035Z FAILED [0.5588s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4138487Z 2025-12-04T10:11:57.4138613Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4139353Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4139967Z 2025-12-04T10:11:57.4140124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4140472Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4140781Z ================== 1 failed, 57 deselected, 2 rerun in 12.28s ================== 2025-12-04T10:11:57.4141048Z Got exit code 1 2025-12-04T10:11:57.4141207Z Retrying single test... 2025-12-04T10:11:57.4141578Z W1204 09:27:57.709000 35075 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4142302Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml 2025-12-04T10:11:57.4142864Z ============================= test session starts ============================== 2025-12-04T10:11:57.4143251Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4143601Z cachedir: .pytest_cache 2025-12-04T10:11:57.4144020Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4144475Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4144692Z configfile: pytest.ini 2025-12-04T10:11:57.4145113Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4145633Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4146425Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4147151Z Running 1 items in this shard 2025-12-04T10:11:57.4147280Z 2025-12-04T10:11:57.4148034Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:58.825054296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4148855Z 2025-12-04T10:11:57.4149161Z [W1204 09:28:08.011242279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4149551Z 2025-12-04T10:11:57.4149846Z [W1204 09:28:08.011502094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4150218Z 2025-12-04T10:11:57.4150515Z [W1204 09:28:08.012093414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4150881Z 2025-12-04T10:11:57.4151173Z [W1204 09:28:08.012282787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4151615Z 2025-12-04T10:11:57.4151906Z [W1204 09:28:08.013556118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4152279Z 2025-12-04T10:11:57.4152570Z [W1204 09:28:08.013750791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4153006Z 2025-12-04T10:11:57.4153299Z [W1204 09:28:08.014014376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4153665Z 2025-12-04T10:11:57.4153965Z [W1204 09:28:08.014172918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4154341Z 2025-12-04T10:11:57.4154638Z [W1204 09:28:08.022570768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4155012Z 2025-12-04T10:11:57.4155303Z [W1204 09:28:08.022789742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4155678Z 2025-12-04T10:11:57.4155966Z [W1204 09:28:08.022968945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4156341Z 2025-12-04T10:11:57.4156633Z [W1204 09:28:08.023203039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4157001Z 2025-12-04T10:11:57.4157296Z [W1204 09:28:08.023351291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4157661Z 2025-12-04T10:11:57.4157956Z [W1204 09:28:08.023593635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4158329Z 2025-12-04T10:11:57.4158623Z [W1204 09:28:08.023737138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4158995Z 2025-12-04T10:11:57.4159287Z [W1204 09:28:08.023973122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4159660Z 2025-12-04T10:11:57.4159989Z [W1204 09:28:08.024116094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4160362Z 2025-12-04T10:11:57.4160651Z [W1204 09:28:08.111461288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4161017Z 2025-12-04T10:11:57.4161310Z [W1204 09:28:08.111675372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4161677Z 2025-12-04T10:11:57.4161976Z [W1204 09:28:08.111831764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4162345Z 2025-12-04T10:11:57.4162632Z [W1204 09:28:08.112041768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4163010Z 2025-12-04T10:11:57.4163304Z [W1204 09:28:08.112165320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4163678Z 2025-12-04T10:11:57.4163969Z [W1204 09:28:08.112388564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4164336Z 2025-12-04T10:11:57.4164648Z [W1204 09:28:08.112513096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4165013Z 2025-12-04T10:11:57.4165440Z [W1204 09:28:08.112716989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4165810Z 2025-12-04T10:11:57.4166101Z [W1204 09:28:08.112839481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4166539Z 2025-12-04T10:11:57.4166632Z ('RERUN', {'yellow': True}) [11.2058s] [100%] 2025-12-04T10:11:57.4167530Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:09.344623118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4168340Z 2025-12-04T10:11:57.4168637Z [W1204 09:28:09.344883732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4169006Z 2025-12-04T10:11:57.4169308Z [W1204 09:28:09.345034504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4169679Z 2025-12-04T10:11:57.4169969Z [W1204 09:28:09.345244428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4170347Z 2025-12-04T10:11:57.4170635Z [W1204 09:28:09.345372390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4171003Z 2025-12-04T10:11:57.4171289Z [W1204 09:28:09.345588794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4171657Z 2025-12-04T10:11:57.4171954Z [W1204 09:28:09.345717676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4172319Z 2025-12-04T10:11:57.4172619Z [W1204 09:28:09.345921659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4172986Z 2025-12-04T10:11:57.4173276Z [W1204 09:28:09.346043381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4173653Z 2025-12-04T10:11:57.4173944Z [W1204 09:28:09.352252994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4174324Z 2025-12-04T10:11:57.4174616Z [W1204 09:28:09.352442427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4174986Z 2025-12-04T10:11:57.4175283Z [W1204 09:28:09.352594560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4175651Z 2025-12-04T10:11:57.4175947Z [W1204 09:28:09.352796913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4176314Z 2025-12-04T10:11:57.4176604Z [W1204 09:28:09.352920245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4176982Z 2025-12-04T10:11:57.4177280Z [W1204 09:28:09.353134279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4177652Z 2025-12-04T10:11:57.4177940Z [W1204 09:28:09.353258461 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4178311Z 2025-12-04T10:11:57.4178601Z [W1204 09:28:09.353459474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4178970Z 2025-12-04T10:11:57.4179269Z [W1204 09:28:09.353586586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4179715Z 2025-12-04T10:11:57.4180010Z [W1204 09:28:09.437295741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4180378Z 2025-12-04T10:11:57.4180669Z [W1204 09:28:09.437515364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4181899Z 2025-12-04T10:11:57.4182191Z [W1204 09:28:09.437666677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4182568Z 2025-12-04T10:11:57.4182859Z [W1204 09:28:09.437873880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4183226Z 2025-12-04T10:11:57.4183522Z [W1204 09:28:09.437995302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4183890Z 2025-12-04T10:11:57.4184188Z [W1204 09:28:09.438207606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4184555Z 2025-12-04T10:11:57.4184850Z [W1204 09:28:09.438329368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4185227Z 2025-12-04T10:11:57.4185521Z [W1204 09:28:09.438529831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4185893Z 2025-12-04T10:11:57.4186185Z [W1204 09:28:09.438651383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4186551Z 2025-12-04T10:11:57.4186634Z ('RERUN', {'yellow': True}) [0.5623s] [100%] 2025-12-04T10:11:57.4187528Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:09.904736796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4188341Z 2025-12-04T10:11:57.4188631Z [W1204 09:28:09.904961440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4189009Z 2025-12-04T10:11:57.4189299Z [W1204 09:28:09.905113692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4189670Z 2025-12-04T10:11:57.4189959Z [W1204 09:28:09.905325546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4190325Z 2025-12-04T10:11:57.4190621Z [W1204 09:28:09.905449328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4190990Z 2025-12-04T10:11:57.4191287Z [W1204 09:28:09.905666231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4191654Z 2025-12-04T10:11:57.4191943Z [W1204 09:28:09.905789224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4192334Z 2025-12-04T10:11:57.4192626Z [W1204 09:28:09.905991427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4192996Z 2025-12-04T10:11:57.4193284Z [W1204 09:28:09.906112669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4193650Z 2025-12-04T10:11:57.4193943Z [W1204 09:28:09.912176880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4194313Z 2025-12-04T10:11:57.4194683Z [W1204 09:28:09.912361173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4195050Z 2025-12-04T10:11:57.4195344Z [W1204 09:28:09.912512046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4195777Z 2025-12-04T10:11:57.4196068Z [W1204 09:28:09.912721359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4196441Z 2025-12-04T10:11:57.4196731Z [W1204 09:28:09.912850481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4197102Z 2025-12-04T10:11:57.4197390Z [W1204 09:28:09.913063935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4197754Z 2025-12-04T10:11:57.4198049Z [W1204 09:28:09.913190427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4198413Z 2025-12-04T10:11:57.4198709Z [W1204 09:28:09.913394640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4199082Z 2025-12-04T10:11:57.4199375Z [W1204 09:28:09.913516633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4199746Z 2025-12-04T10:11:57.4200092Z [W1204 09:28:10.996962642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4200467Z 2025-12-04T10:11:57.4200757Z [W1204 09:28:10.997148606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4201123Z 2025-12-04T10:11:57.4201422Z [W1204 09:28:10.997298138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4201789Z 2025-12-04T10:11:57.4202085Z [W1204 09:28:10.997502841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4202455Z 2025-12-04T10:11:57.4202746Z [W1204 09:28:10.997625823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4203116Z 2025-12-04T10:11:57.4203404Z [W1204 09:28:10.997841037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4203776Z 2025-12-04T10:11:57.4204064Z [W1204 09:28:10.997968099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4204429Z 2025-12-04T10:11:57.4204730Z [W1204 09:28:10.998169532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4205099Z 2025-12-04T10:11:57.4205393Z [W1204 09:28:10.998290774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4205761Z 2025-12-04T10:11:57.4205827Z FAILED [0.5607s] [100%] 2025-12-04T10:11:57.4205936Z 2025-12-04T10:11:57.4206022Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4206498Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4206959Z Traceback (most recent call last): 2025-12-04T10:11:57.4207409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4207856Z method(*args, **kwargs) 2025-12-04T10:11:57.4208350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4208791Z method(*args, **kwargs) 2025-12-04T10:11:57.4209200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4209636Z with policy(): 2025-12-04T10:11:57.4210104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4257937Z raise RuntimeError(msg) 2025-12-04T10:11:57.4259000Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4259912Z 2025-12-04T10:11:57.4260055Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4260829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4261455Z 2025-12-04T10:11:57.4261618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4262014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4262327Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4262879Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4263436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4263704Z graph_break [] 2025-12-04T10:11:57.4263929Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4264874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4265726Z if out == self.unknown_value: 2025-12-04T10:11:57.4266178Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4266640Z Traceback (most recent call last): 2025-12-04T10:11:57.4267091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4267535Z method(*args, **kwargs) 2025-12-04T10:11:57.4267955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4268388Z method(*args, **kwargs) 2025-12-04T10:11:57.4268790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4269225Z with policy(): 2025-12-04T10:11:57.4269618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4270062Z raise RuntimeError(msg) 2025-12-04T10:11:57.4271014Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4271922Z 2025-12-04T10:11:57.4272059Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4272976Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4273597Z 2025-12-04T10:11:57.4273765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4274141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4274561Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4275096Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4275653Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4275919Z graph_break [] 2025-12-04T10:11:57.4276139Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4277059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4277914Z if out == self.unknown_value: 2025-12-04T10:11:57.4278171Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4278482Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4278781Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4279345Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4279832Z graph_break [] 2025-12-04T10:11:57.4280053Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4280541Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4281006Z Traceback (most recent call last): 2025-12-04T10:11:57.4281463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4281918Z method(*args, **kwargs) 2025-12-04T10:11:57.4282333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4282783Z method(*args, **kwargs) 2025-12-04T10:11:57.4283187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4283617Z with policy(): 2025-12-04T10:11:57.4284010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4284453Z raise RuntimeError(msg) 2025-12-04T10:11:57.4285400Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4286306Z 2025-12-04T10:11:57.4286434Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4287188Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4287809Z 2025-12-04T10:11:57.4287971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4288337Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4288640Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4289238Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4289790Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4290054Z graph_break [] 2025-12-04T10:11:57.4290265Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4291244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4292076Z if out == self.unknown_value: 2025-12-04T10:11:57.4292333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4292639Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4292933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4293495Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4293985Z graph_break [] 2025-12-04T10:11:57.4294201Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4294515Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4294813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4295355Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4295835Z graph_break [] 2025-12-04T10:11:57.4296416Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml - 2025-12-04T10:11:57.4297077Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4298595Z FAILED [0.5607s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4299969Z 2025-12-04T10:11:57.4300101Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4300841Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4301457Z 2025-12-04T10:11:57.4301623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4301968Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4302266Z ================== 1 failed, 57 deselected, 2 rerun in 12.35s ================== 2025-12-04T10:11:57.4302532Z Got exit code 1 2025-12-04T10:11:57.4303119Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4303920Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.4304498Z W1204 09:28:16.790000 35262 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4305220Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml 2025-12-04T10:11:57.4305858Z ============================= test session starts ============================== 2025-12-04T10:11:57.4306252Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4306601Z cachedir: .pytest_cache 2025-12-04T10:11:57.4307015Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4307569Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4307780Z configfile: pytest.ini 2025-12-04T10:11:57.4308201Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4308729Z collecting ... collected 58 items / 2 deselected / 56 selected 2025-12-04T10:11:57.4309022Z stepcurrent: skipping 2 already run items. 2025-12-04T10:11:57.4309244Z Running 56 items in this shard 2025-12-04T10:11:57.4309370Z 2025-12-04T10:11:57.4309897Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0043s] [ 1%] 2025-12-04T10:11:57.4310993Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5955s] [ 1%] 2025-12-04T10:11:57.4312038Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.6029s] [ 1%] 2025-12-04T10:11:57.4312577Z 2025-12-04T10:11:57.4312668Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4313145Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4313604Z Traceback (most recent call last): 2025-12-04T10:11:57.4314062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4314508Z method(*args, **kwargs) 2025-12-04T10:11:57.4314921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4315363Z method(*args, **kwargs) 2025-12-04T10:11:57.4315771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4316202Z with policy(): 2025-12-04T10:11:57.4316601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4317224Z raise RuntimeError(msg) 2025-12-04T10:11:57.4318169Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4319058Z 2025-12-04T10:11:57.4319188Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4319985Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4320605Z 2025-12-04T10:11:57.4320766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4321133Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4321437Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4321977Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4322648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4322918Z graph_break [] 2025-12-04T10:11:57.4323321Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4323879Z Traceback (most recent call last): 2025-12-04T10:11:57.4324332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4324785Z method(*args, **kwargs) 2025-12-04T10:11:57.4325208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4325651Z method(*args, **kwargs) 2025-12-04T10:11:57.4326064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4326495Z with policy(): 2025-12-04T10:11:57.4326907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4327348Z raise RuntimeError(msg) 2025-12-04T10:11:57.4328296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4329202Z 2025-12-04T10:11:57.4329337Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4330079Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4330696Z 2025-12-04T10:11:57.4330874Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4331248Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4331559Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4332083Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4332644Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4332909Z graph_break [] 2025-12-04T10:11:57.4333131Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4333435Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4333727Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4334284Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4334765Z graph_break [] 2025-12-04T10:11:57.4334938Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4335422Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4335896Z Traceback (most recent call last): 2025-12-04T10:11:57.4336350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4336792Z method(*args, **kwargs) 2025-12-04T10:11:57.4337202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4337640Z method(*args, **kwargs) 2025-12-04T10:11:57.4338046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4338480Z with policy(): 2025-12-04T10:11:57.4338854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4338927Z raise RuntimeError(msg) 2025-12-04T10:11:57.4339763Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4339831Z 2025-12-04T10:11:57.4339962Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4340505Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4340509Z 2025-12-04T10:11:57.4340672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4340804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4340899Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4341253Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4341395Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4341455Z graph_break [] 2025-12-04T10:11:57.4341584Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4341676Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4341798Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4342154Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4342214Z graph_break [] 2025-12-04T10:11:57.4342340Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4342435Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4342557Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4342903Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4342961Z graph_break [] 2025-12-04T10:11:57.4343444Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml - 2025-12-04T10:11:57.4343552Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4344880Z FAILED [0.6029s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4344887Z 2025-12-04T10:11:57.4345018Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4345553Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4345556Z 2025-12-04T10:11:57.4345724Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4345901Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4346016Z =================== 1 failed, 2 deselected, 2 rerun in 3.23s =================== 2025-12-04T10:11:57.4346080Z Got exit code 1 2025-12-04T10:11:57.4346146Z Retrying single test... 2025-12-04T10:11:57.4346481Z W1204 09:28:26.566000 35444 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4346880Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml 2025-12-04T10:11:57.4346977Z ============================= test session starts ============================== 2025-12-04T10:11:57.4347193Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4347260Z cachedir: .pytest_cache 2025-12-04T10:11:57.4347574Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4347654Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4347721Z configfile: pytest.ini 2025-12-04T10:11:57.4348047Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4348177Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4348764Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4348842Z Running 1 items in this shard 2025-12-04T10:11:57.4348846Z 2025-12-04T10:11:57.4349596Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:27.698347953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4349600Z 2025-12-04T10:11:57.4349906Z [W1204 09:28:36.851377013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4349912Z 2025-12-04T10:11:57.4350210Z [W1204 09:28:36.851659058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4350214Z 2025-12-04T10:11:57.4350505Z [W1204 09:28:36.852242108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4350508Z 2025-12-04T10:11:57.4350795Z [W1204 09:28:36.852474022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4350798Z 2025-12-04T10:11:57.4351096Z [W1204 09:28:36.853695653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4351099Z 2025-12-04T10:11:57.4351385Z [W1204 09:28:36.853869306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4351391Z 2025-12-04T10:11:57.4351679Z [W1204 09:28:36.854137190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4351686Z 2025-12-04T10:11:57.4351977Z [W1204 09:28:36.854298043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4351980Z 2025-12-04T10:11:57.4352267Z [W1204 09:28:36.862951890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4352271Z 2025-12-04T10:11:57.4352636Z [W1204 09:28:36.863174744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4352640Z 2025-12-04T10:11:57.4352930Z [W1204 09:28:36.863350477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4352933Z 2025-12-04T10:11:57.4353315Z [W1204 09:28:36.863589591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4353320Z 2025-12-04T10:11:57.4353612Z [W1204 09:28:36.863732553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4353614Z 2025-12-04T10:11:57.4353905Z [W1204 09:28:36.863974988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4353908Z 2025-12-04T10:11:57.4354199Z [W1204 09:28:36.864118190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4354205Z 2025-12-04T10:11:57.4354497Z [W1204 09:28:36.864351024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4354500Z 2025-12-04T10:11:57.4354786Z [W1204 09:28:36.864477686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4354792Z 2025-12-04T10:11:57.4355081Z [W1204 09:28:37.953122867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4355088Z 2025-12-04T10:11:57.4355375Z [W1204 09:28:37.953366841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4355378Z 2025-12-04T10:11:57.4355667Z [W1204 09:28:37.953537934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4355670Z 2025-12-04T10:11:57.4355965Z [W1204 09:28:37.953758458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4355968Z 2025-12-04T10:11:57.4356263Z [W1204 09:28:37.953883450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4356269Z 2025-12-04T10:11:57.4356561Z [W1204 09:28:37.954106884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4356565Z 2025-12-04T10:11:57.4356854Z [W1204 09:28:37.954231206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4356860Z 2025-12-04T10:11:57.4357147Z [W1204 09:28:37.954436709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4357150Z 2025-12-04T10:11:57.4357439Z [W1204 09:28:37.954558851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4357443Z 2025-12-04T10:11:57.4357529Z ('RERUN', {'yellow': True}) [11.1880s] [100%] 2025-12-04T10:11:57.4358282Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:38.178125644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4358288Z 2025-12-04T10:11:57.4358582Z [W1204 09:28:38.178385498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4358585Z 2025-12-04T10:11:57.4358875Z [W1204 09:28:38.178546261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4358878Z 2025-12-04T10:11:57.4359244Z [W1204 09:28:38.178757235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4359248Z 2025-12-04T10:11:57.4359539Z [W1204 09:28:38.178887237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4359607Z 2025-12-04T10:11:57.4359944Z [W1204 09:28:38.179105601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4359948Z 2025-12-04T10:11:57.4360241Z [W1204 09:28:38.179231523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4360245Z 2025-12-04T10:11:57.4360533Z [W1204 09:28:38.179438786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4360536Z 2025-12-04T10:11:57.4360830Z [W1204 09:28:38.179561598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4360833Z 2025-12-04T10:11:57.4361120Z [W1204 09:28:38.185660921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4361126Z 2025-12-04T10:11:57.4361416Z [W1204 09:28:38.185833584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4361419Z 2025-12-04T10:11:57.4361705Z [W1204 09:28:38.185985407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4361708Z 2025-12-04T10:11:57.4361996Z [W1204 09:28:38.186191970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4361999Z 2025-12-04T10:11:57.4362291Z [W1204 09:28:38.186314732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4362294Z 2025-12-04T10:11:57.4362583Z [W1204 09:28:38.186533006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4362590Z 2025-12-04T10:11:57.4362875Z [W1204 09:28:38.186660329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4362878Z 2025-12-04T10:11:57.4363163Z [W1204 09:28:38.186864252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4363169Z 2025-12-04T10:11:57.4363453Z [W1204 09:28:38.186988254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4363457Z 2025-12-04T10:11:57.4363744Z [W1204 09:28:38.268493464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4363747Z 2025-12-04T10:11:57.4364040Z [W1204 09:28:38.268716128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4364046Z 2025-12-04T10:11:57.4364335Z [W1204 09:28:38.268869891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4364338Z 2025-12-04T10:11:57.4364629Z [W1204 09:28:38.269075734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4364632Z 2025-12-04T10:11:57.4364919Z [W1204 09:28:38.269204096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4364922Z 2025-12-04T10:11:57.4365286Z [W1204 09:28:38.269415390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4365290Z 2025-12-04T10:11:57.4365576Z [W1204 09:28:38.269539462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4365579Z 2025-12-04T10:11:57.4365935Z [W1204 09:28:38.269745605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4365939Z 2025-12-04T10:11:57.4366229Z [W1204 09:28:38.269867707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4366233Z 2025-12-04T10:11:57.4366313Z ('RERUN', {'yellow': True}) [0.5509s] [100%] 2025-12-04T10:11:57.4367063Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:38.728016517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4367067Z 2025-12-04T10:11:57.4367355Z [W1204 09:28:38.728228120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4367358Z 2025-12-04T10:11:57.4367685Z [W1204 09:28:38.728395273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4367689Z 2025-12-04T10:11:57.4368030Z [W1204 09:28:38.728622327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4368033Z 2025-12-04T10:11:57.4368375Z [W1204 09:28:38.728750009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4368379Z 2025-12-04T10:11:57.4368720Z [W1204 09:28:38.728968203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4368723Z 2025-12-04T10:11:57.4369076Z [W1204 09:28:38.729091525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4369080Z 2025-12-04T10:11:57.4369418Z [W1204 09:28:38.729300199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4369425Z 2025-12-04T10:11:57.4369767Z [W1204 09:28:38.729422441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4369771Z 2025-12-04T10:11:57.4370080Z [W1204 09:28:38.735424492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4370083Z 2025-12-04T10:11:57.4370371Z [W1204 09:28:38.735597985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4370377Z 2025-12-04T10:11:57.4370672Z [W1204 09:28:38.735751138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4370675Z 2025-12-04T10:11:57.4370964Z [W1204 09:28:38.735957301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4370970Z 2025-12-04T10:11:57.4371261Z [W1204 09:28:38.736081213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4371264Z 2025-12-04T10:11:57.4371552Z [W1204 09:28:38.736305957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4371555Z 2025-12-04T10:11:57.4371846Z [W1204 09:28:38.736432049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4371849Z 2025-12-04T10:11:57.4372213Z [W1204 09:28:38.736641363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4372217Z 2025-12-04T10:11:57.4372509Z [W1204 09:28:38.736763855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4372577Z 2025-12-04T10:11:57.4372865Z [W1204 09:28:38.818823964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4372868Z 2025-12-04T10:11:57.4373158Z [W1204 09:28:38.819014158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4373165Z 2025-12-04T10:11:57.4373454Z [W1204 09:28:38.819166990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4373457Z 2025-12-04T10:11:57.4373756Z [W1204 09:28:38.819371504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4373759Z 2025-12-04T10:11:57.4374058Z [W1204 09:28:38.819493586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4374064Z 2025-12-04T10:11:57.4374351Z [W1204 09:28:38.819707899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4374354Z 2025-12-04T10:11:57.4374648Z [W1204 09:28:38.819830901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4374651Z 2025-12-04T10:11:57.4374939Z [W1204 09:28:38.820055615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4374942Z 2025-12-04T10:11:57.4375238Z [W1204 09:28:38.820185158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4375241Z 2025-12-04T10:11:57.4375303Z FAILED [0.5554s] [100%] 2025-12-04T10:11:57.4375307Z 2025-12-04T10:11:57.4375397Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4375715Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4375792Z Traceback (most recent call last): 2025-12-04T10:11:57.4376111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4376177Z method(*args, **kwargs) 2025-12-04T10:11:57.4376472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4376541Z method(*args, **kwargs) 2025-12-04T10:11:57.4376835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4376901Z with policy(): 2025-12-04T10:11:57.4377196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4377265Z raise RuntimeError(msg) 2025-12-04T10:11:57.4378097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4378101Z 2025-12-04T10:11:57.4378229Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4378870Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4378875Z 2025-12-04T10:11:57.4379038Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4379172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4379350Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4379703Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4379839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4379898Z graph_break [] 2025-12-04T10:11:57.4380023Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4380726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4380801Z if out == self.unknown_value: 2025-12-04T10:11:57.4381119Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4381196Z Traceback (most recent call last): 2025-12-04T10:11:57.4381497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4381568Z method(*args, **kwargs) 2025-12-04T10:11:57.4381863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4381924Z method(*args, **kwargs) 2025-12-04T10:11:57.4382220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4382281Z with policy(): 2025-12-04T10:11:57.4382596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4382664Z raise RuntimeError(msg) 2025-12-04T10:11:57.4383502Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4383512Z 2025-12-04T10:11:57.4383641Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4384177Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4384181Z 2025-12-04T10:11:57.4384345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4384474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4384573Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4384936Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4385064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4385132Z graph_break [] 2025-12-04T10:11:57.4385256Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4385950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4386174Z if out == self.unknown_value: 2025-12-04T10:11:57.4386301Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4386398Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4386522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4386932Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4386996Z graph_break [] 2025-12-04T10:11:57.4387080Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4387390Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4387467Z Traceback (most recent call last): 2025-12-04T10:11:57.4387775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4387842Z method(*args, **kwargs) 2025-12-04T10:11:57.4388135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4388198Z method(*args, **kwargs) 2025-12-04T10:11:57.4388491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4388550Z with policy(): 2025-12-04T10:11:57.4388850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4388915Z raise RuntimeError(msg) 2025-12-04T10:11:57.4389746Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4389751Z 2025-12-04T10:11:57.4389881Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4390420Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4390427Z 2025-12-04T10:11:57.4390587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4390711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4390805Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4391153Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4391286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4391357Z graph_break [] 2025-12-04T10:11:57.4391482Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4392175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4392251Z if out == self.unknown_value: 2025-12-04T10:11:57.4392373Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4392469Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4392591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4392939Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4393091Z graph_break [] 2025-12-04T10:11:57.4393217Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4393310Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4393436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4393838Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4393900Z graph_break [] 2025-12-04T10:11:57.4394386Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml - 2025-12-04T10:11:57.4394484Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4395815Z FAILED [0.5554s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4395821Z 2025-12-04T10:11:57.4395946Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4396486Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4396490Z 2025-12-04T10:11:57.4396647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4396753Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4396882Z ================== 1 failed, 57 deselected, 2 rerun in 12.32s ================== 2025-12-04T10:11:57.4396942Z Got exit code 1 2025-12-04T10:11:57.4397016Z Retrying single test... 2025-12-04T10:11:57.4397281Z W1204 09:28:45.493000 35631 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4397675Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml 2025-12-04T10:11:57.4397772Z ============================= test session starts ============================== 2025-12-04T10:11:57.4397981Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4398052Z cachedir: .pytest_cache 2025-12-04T10:11:57.4398358Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4398438Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4398511Z configfile: pytest.ini 2025-12-04T10:11:57.4398827Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4398956Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4399548Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4399619Z Running 1 items in this shard 2025-12-04T10:11:57.4399623Z 2025-12-04T10:11:57.4400424Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:46.608875565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4400503Z 2025-12-04T10:11:57.4400808Z [W1204 09:28:55.661299219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4400811Z 2025-12-04T10:11:57.4401108Z [W1204 09:28:55.661568154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4401186Z 2025-12-04T10:11:57.4401480Z [W1204 09:28:55.662137653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4401483Z 2025-12-04T10:11:57.4401776Z [W1204 09:28:55.662353097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4401779Z 2025-12-04T10:11:57.4402069Z [W1204 09:28:55.663679850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4402072Z 2025-12-04T10:11:57.4402370Z [W1204 09:28:55.663908984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4402374Z 2025-12-04T10:11:57.4402660Z [W1204 09:28:55.664208669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4402665Z 2025-12-04T10:11:57.4402956Z [W1204 09:28:55.664381043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4402959Z 2025-12-04T10:11:57.4403249Z [W1204 09:28:55.672792299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4403252Z 2025-12-04T10:11:57.4403540Z [W1204 09:28:55.672992902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4403544Z 2025-12-04T10:11:57.4403838Z [W1204 09:28:55.673168125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4403841Z 2025-12-04T10:11:57.4404127Z [W1204 09:28:55.673405700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4404133Z 2025-12-04T10:11:57.4404425Z [W1204 09:28:55.673555352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4404428Z 2025-12-04T10:11:57.4404715Z [W1204 09:28:55.673794966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4404718Z 2025-12-04T10:11:57.4405014Z [W1204 09:28:55.673931609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4405017Z 2025-12-04T10:11:57.4405307Z [W1204 09:28:55.674154102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4405310Z 2025-12-04T10:11:57.4405605Z [W1204 09:28:55.674281285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4405610Z 2025-12-04T10:11:57.4405901Z [W1204 09:28:55.762080002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4405905Z 2025-12-04T10:11:57.4406193Z [W1204 09:28:55.762301165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4406201Z 2025-12-04T10:11:57.4406491Z [W1204 09:28:55.762455568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4406495Z 2025-12-04T10:11:57.4406868Z [W1204 09:28:55.762665382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4406872Z 2025-12-04T10:11:57.4407165Z [W1204 09:28:55.762796944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4407233Z 2025-12-04T10:11:57.4407520Z [W1204 09:28:55.763013458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4407524Z 2025-12-04T10:11:57.4407817Z [W1204 09:28:55.763142390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4407820Z 2025-12-04T10:11:57.4408122Z [W1204 09:28:55.763349084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4408126Z 2025-12-04T10:11:57.4408425Z [W1204 09:28:55.763473406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4408429Z 2025-12-04T10:11:57.4408511Z ('RERUN', {'yellow': True}) [11.0721s] [100%] 2025-12-04T10:11:57.4409262Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:57.998558854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4409269Z 2025-12-04T10:11:57.4409557Z [W1204 09:28:57.998823368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4409560Z 2025-12-04T10:11:57.4409851Z [W1204 09:28:57.998973751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4409860Z 2025-12-04T10:11:57.4410150Z [W1204 09:28:57.999189705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4410153Z 2025-12-04T10:11:57.4410444Z [W1204 09:28:57.999313617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4410450Z 2025-12-04T10:11:57.4410742Z [W1204 09:28:57.999532251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4410745Z 2025-12-04T10:11:57.4411035Z [W1204 09:28:57.999660063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4411038Z 2025-12-04T10:11:57.4411332Z [W1204 09:28:57.999863996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4411335Z 2025-12-04T10:11:57.4411626Z [W1204 09:28:57.999986858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4411629Z 2025-12-04T10:11:57.4411923Z [W1204 09:28:57.006375039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4411926Z 2025-12-04T10:11:57.4412218Z [W1204 09:28:57.006557693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4412221Z 2025-12-04T10:11:57.4412513Z [W1204 09:28:57.006711005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4412516Z 2025-12-04T10:11:57.4412805Z [W1204 09:28:57.006915919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4412808Z 2025-12-04T10:11:57.4413101Z [W1204 09:28:57.007044801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4413173Z 2025-12-04T10:11:57.4413474Z [W1204 09:28:57.007262295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4413478Z 2025-12-04T10:11:57.4413768Z [W1204 09:28:57.007389127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4413835Z 2025-12-04T10:11:57.4414129Z [W1204 09:28:57.007596911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4414132Z 2025-12-04T10:11:57.4414421Z [W1204 09:28:57.007725333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4414424Z 2025-12-04T10:11:57.4414715Z [W1204 09:28:57.093708618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4414719Z 2025-12-04T10:11:57.4415011Z [W1204 09:28:57.093928122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4415014Z 2025-12-04T10:11:57.4415311Z [W1204 09:28:57.094077664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4415317Z 2025-12-04T10:11:57.4415607Z [W1204 09:28:57.094291608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4415610Z 2025-12-04T10:11:57.4415899Z [W1204 09:28:57.094420730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4415907Z 2025-12-04T10:11:57.4416198Z [W1204 09:28:57.094637824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4416201Z 2025-12-04T10:11:57.4416496Z [W1204 09:28:57.094762416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4416499Z 2025-12-04T10:11:57.4416791Z [W1204 09:28:57.094966210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4416797Z 2025-12-04T10:11:57.4417234Z [W1204 09:28:57.095090702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4417238Z 2025-12-04T10:11:57.4417334Z ('RERUN', {'yellow': True}) [0.5724s] [100%] 2025-12-04T10:11:57.4418088Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:57.567704446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4418092Z 2025-12-04T10:11:57.4418391Z [W1204 09:28:57.567919399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4418394Z 2025-12-04T10:11:57.4418686Z [W1204 09:28:57.568071772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4418693Z 2025-12-04T10:11:57.4418983Z [W1204 09:28:57.568282506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4418986Z 2025-12-04T10:11:57.4419274Z [W1204 09:28:57.568423288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4419277Z 2025-12-04T10:11:57.4419565Z [W1204 09:28:57.568646442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4419572Z 2025-12-04T10:11:57.4419969Z [W1204 09:28:57.568772654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4419972Z 2025-12-04T10:11:57.4420261Z [W1204 09:28:57.568982418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4420352Z 2025-12-04T10:11:57.4420647Z [W1204 09:28:57.569108400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4420650Z 2025-12-04T10:11:57.4420939Z [W1204 09:28:57.575280378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4420942Z 2025-12-04T10:11:57.4421240Z [W1204 09:28:57.575456221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4421243Z 2025-12-04T10:11:57.4421536Z [W1204 09:28:57.575610083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4421540Z 2025-12-04T10:11:57.4421834Z [W1204 09:28:57.575814627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4421839Z 2025-12-04T10:11:57.4422126Z [W1204 09:28:57.575940029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4422129Z 2025-12-04T10:11:57.4422422Z [W1204 09:28:57.576154503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4422425Z 2025-12-04T10:11:57.4422713Z [W1204 09:28:57.576276895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4422716Z 2025-12-04T10:11:57.4423008Z [W1204 09:28:57.576495799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4423011Z 2025-12-04T10:11:57.4423305Z [W1204 09:28:57.576624211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4423311Z 2025-12-04T10:11:57.4423602Z [W1204 09:28:57.661066499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4423605Z 2025-12-04T10:11:57.4423900Z [W1204 09:28:57.661263263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4423903Z 2025-12-04T10:11:57.4424191Z [W1204 09:28:57.661419905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4424194Z 2025-12-04T10:11:57.4424489Z [W1204 09:28:57.661629549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4424493Z 2025-12-04T10:11:57.4424785Z [W1204 09:28:57.661753251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4424789Z 2025-12-04T10:11:57.4425086Z [W1204 09:28:57.661969145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4425089Z 2025-12-04T10:11:57.4425379Z [W1204 09:28:57.662093577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4425382Z 2025-12-04T10:11:57.4425672Z [W1204 09:28:57.662297521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4425681Z 2025-12-04T10:11:57.4425971Z [W1204 09:28:57.662419823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4426046Z 2025-12-04T10:11:57.4426110Z FAILED [0.5624s] [100%] 2025-12-04T10:11:57.4426114Z 2025-12-04T10:11:57.4426205Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4426519Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4426691Z Traceback (most recent call last): 2025-12-04T10:11:57.4427007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4427074Z method(*args, **kwargs) 2025-12-04T10:11:57.4427382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4427447Z method(*args, **kwargs) 2025-12-04T10:11:57.4427740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4427806Z with policy(): 2025-12-04T10:11:57.4428107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4428179Z raise RuntimeError(msg) 2025-12-04T10:11:57.4429005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4429008Z 2025-12-04T10:11:57.4429142Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4429692Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4429695Z 2025-12-04T10:11:57.4429859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4429994Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4430094Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4430452Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4430589Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4430649Z graph_break [] 2025-12-04T10:11:57.4430783Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4431479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4431551Z if out == self.unknown_value: 2025-12-04T10:11:57.4431866Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4431943Z Traceback (most recent call last): 2025-12-04T10:11:57.4432252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4432324Z method(*args, **kwargs) 2025-12-04T10:11:57.4432622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4432693Z method(*args, **kwargs) 2025-12-04T10:11:57.4432985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4433046Z with policy(): 2025-12-04T10:11:57.4433423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4433491Z raise RuntimeError(msg) 2025-12-04T10:11:57.4434328Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4434398Z 2025-12-04T10:11:57.4434527Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4435066Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4435070Z 2025-12-04T10:11:57.4435229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4435359Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4435459Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4435808Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4435942Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4436000Z graph_break [] 2025-12-04T10:11:57.4436126Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4436823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4436892Z if out == self.unknown_value: 2025-12-04T10:11:57.4437016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4437114Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4437242Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4437599Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4437661Z graph_break [] 2025-12-04T10:11:57.4437757Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4438074Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4438150Z Traceback (most recent call last): 2025-12-04T10:11:57.4438460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4438527Z method(*args, **kwargs) 2025-12-04T10:11:57.4438826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4438896Z method(*args, **kwargs) 2025-12-04T10:11:57.4439196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4439259Z with policy(): 2025-12-04T10:11:57.4439561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4439628Z raise RuntimeError(msg) 2025-12-04T10:11:57.4440514Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4440518Z 2025-12-04T10:11:57.4440724Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4441267Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4441334Z 2025-12-04T10:11:57.4441497Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4441625Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4441730Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4442083Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4442217Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4442277Z graph_break [] 2025-12-04T10:11:57.4442404Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4443102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4443178Z if out == self.unknown_value: 2025-12-04T10:11:57.4443302Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4443403Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4443526Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4443879Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4443938Z graph_break [] 2025-12-04T10:11:57.4444064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4444162Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4444288Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4444634Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4444704Z graph_break [] 2025-12-04T10:11:57.4445194Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml - 2025-12-04T10:11:57.4445301Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4446627Z FAILED [0.5624s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4446633Z 2025-12-04T10:11:57.4446777Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4447322Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4447325Z 2025-12-04T10:11:57.4447491Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4447598Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4447785Z ================== 1 failed, 57 deselected, 2 rerun in 12.23s ================== 2025-12-04T10:11:57.4447850Z Got exit code 1 2025-12-04T10:11:57.4448350Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4448659Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.4448938Z W1204 09:29:04.282000 35818 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4449327Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml 2025-12-04T10:11:57.4449428Z ============================= test session starts ============================== 2025-12-04T10:11:57.4449637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4449703Z cachedir: .pytest_cache 2025-12-04T10:11:57.4450018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4450097Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4450168Z configfile: pytest.ini 2025-12-04T10:11:57.4450491Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4450620Z collecting ... collected 58 items / 3 deselected / 55 selected 2025-12-04T10:11:57.4450714Z stepcurrent: skipping 3 already run items. 2025-12-04T10:11:57.4450785Z Running 55 items in this shard 2025-12-04T10:11:57.4450788Z 2025-12-04T10:11:57.4451315Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9943s] [ 1%] 2025-12-04T10:11:57.4451824Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5933s] [ 1%] 2025-12-04T10:11:57.4452287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5945s] [ 1%] 2025-12-04T10:11:57.4452293Z 2025-12-04T10:11:57.4452385Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4452698Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4452778Z Traceback (most recent call last): 2025-12-04T10:11:57.4453095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4453160Z method(*args, **kwargs) 2025-12-04T10:11:57.4453467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4453531Z method(*args, **kwargs) 2025-12-04T10:11:57.4453832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4453897Z with policy(): 2025-12-04T10:11:57.4454190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4454268Z raise RuntimeError(msg) 2025-12-04T10:11:57.4455093Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4455097Z 2025-12-04T10:11:57.4455305Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4455844Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4455929Z 2025-12-04T10:11:57.4456090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4456222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4456319Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4456678Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4456807Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4456868Z graph_break [] 2025-12-04T10:11:57.4457186Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4457259Z Traceback (most recent call last): 2025-12-04T10:11:57.4457559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4457633Z method(*args, **kwargs) 2025-12-04T10:11:57.4457928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4457996Z method(*args, **kwargs) 2025-12-04T10:11:57.4458288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4458348Z with policy(): 2025-12-04T10:11:57.4458661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4458728Z raise RuntimeError(msg) 2025-12-04T10:11:57.4459562Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4459568Z 2025-12-04T10:11:57.4459698Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4460232Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4460238Z 2025-12-04T10:11:57.4460397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4460527Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4460627Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4460979Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4461104Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4461174Z graph_break [] 2025-12-04T10:11:57.4461298Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4461395Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4461519Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4461864Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4461930Z graph_break [] 2025-12-04T10:11:57.4462014Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4462396Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4462475Z Traceback (most recent call last): 2025-12-04T10:11:57.4462775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4462916Z method(*args, **kwargs) 2025-12-04T10:11:57.4463213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4463279Z method(*args, **kwargs) 2025-12-04T10:11:57.4463572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4463631Z with policy(): 2025-12-04T10:11:57.4463928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4463993Z raise RuntimeError(msg) 2025-12-04T10:11:57.4464828Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4464834Z 2025-12-04T10:11:57.4464963Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4465496Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4465501Z 2025-12-04T10:11:57.4465660Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4465787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4465884Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4466235Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4466360Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4466425Z graph_break [] 2025-12-04T10:11:57.4466551Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4466643Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4466768Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4467112Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4467170Z graph_break [] 2025-12-04T10:11:57.4467299Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4467396Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4467525Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4467873Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4467935Z graph_break [] 2025-12-04T10:11:57.4468427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml - 2025-12-04T10:11:57.4468526Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4469926Z FAILED [0.5945s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4469993Z 2025-12-04T10:11:57.4470120Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4470662Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4470665Z 2025-12-04T10:11:57.4470820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4470925Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4471044Z =================== 1 failed, 3 deselected, 2 rerun in 3.21s =================== 2025-12-04T10:11:57.4471104Z Got exit code 1 2025-12-04T10:11:57.4471175Z Retrying single test... 2025-12-04T10:11:57.4471439Z W1204 09:29:14.020000 36000 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4471823Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml 2025-12-04T10:11:57.4471927Z ============================= test session starts ============================== 2025-12-04T10:11:57.4472136Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4472204Z cachedir: .pytest_cache 2025-12-04T10:11:57.4472517Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4472595Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4472665Z configfile: pytest.ini 2025-12-04T10:11:57.4472990Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4473121Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4473712Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4473786Z Running 1 items in this shard 2025-12-04T10:11:57.4473789Z 2025-12-04T10:11:57.4474633Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:15.133219758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4474637Z 2025-12-04T10:11:57.4474943Z [W1204 09:29:24.293553230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4474947Z 2025-12-04T10:11:57.4475245Z [W1204 09:29:24.293815624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4475252Z 2025-12-04T10:11:57.4475543Z [W1204 09:29:24.294385544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4475547Z 2025-12-04T10:11:57.4475838Z [W1204 09:29:24.294581377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4475846Z 2025-12-04T10:11:57.4476135Z [W1204 09:29:24.295844949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4476139Z 2025-12-04T10:11:57.4476500Z [W1204 09:29:24.296006222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4476504Z 2025-12-04T10:11:57.4476801Z [W1204 09:29:24.296277006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4476804Z 2025-12-04T10:11:57.4477166Z [W1204 09:29:24.296441719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4477170Z 2025-12-04T10:11:57.4477467Z [W1204 09:29:24.304929983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4477470Z 2025-12-04T10:11:57.4477759Z [W1204 09:29:24.305150487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4477763Z 2025-12-04T10:11:57.4478062Z [W1204 09:29:24.305330740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4478068Z 2025-12-04T10:11:57.4478361Z [W1204 09:29:24.305570494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4478364Z 2025-12-04T10:11:57.4478659Z [W1204 09:29:24.305722347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4478665Z 2025-12-04T10:11:57.4478953Z [W1204 09:29:24.305964381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4478956Z 2025-12-04T10:11:57.4479247Z [W1204 09:29:24.306107853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4479257Z 2025-12-04T10:11:57.4479546Z [W1204 09:29:24.306340477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4479550Z 2025-12-04T10:11:57.4479843Z [W1204 09:29:24.306492980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4479846Z 2025-12-04T10:11:57.4480189Z [W1204 09:29:24.394751314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4480195Z 2025-12-04T10:11:57.4480485Z [W1204 09:29:24.394969998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4480488Z 2025-12-04T10:11:57.4480783Z [W1204 09:29:24.395122420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4480786Z 2025-12-04T10:11:57.4481077Z [W1204 09:29:24.395334164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4481080Z 2025-12-04T10:11:57.4481377Z [W1204 09:29:24.395464936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4481380Z 2025-12-04T10:11:57.4481670Z [W1204 09:29:24.395683800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4481676Z 2025-12-04T10:11:57.4481966Z [W1204 09:29:24.395810332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4481969Z 2025-12-04T10:11:57.4482260Z [W1204 09:29:24.396011855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4482262Z 2025-12-04T10:11:57.4482550Z [W1204 09:29:24.396135697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4482553Z 2025-12-04T10:11:57.4482728Z ('RERUN', {'yellow': True}) [11.1854s] [100%] 2025-12-04T10:11:57.4483478Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:25.634565905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4483546Z 2025-12-04T10:11:57.4483846Z [W1204 09:29:25.634822739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4483849Z 2025-12-04T10:11:57.4484139Z [W1204 09:29:25.634977952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4484142Z 2025-12-04T10:11:57.4484437Z [W1204 09:29:25.635185106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4484441Z 2025-12-04T10:11:57.4484735Z [W1204 09:29:25.635317928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4484738Z 2025-12-04T10:11:57.4485030Z [W1204 09:29:25.635541432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4485036Z 2025-12-04T10:11:57.4485325Z [W1204 09:29:25.635668464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4485328Z 2025-12-04T10:11:57.4485617Z [W1204 09:29:25.635874767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4485624Z 2025-12-04T10:11:57.4485911Z [W1204 09:29:25.636004220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4485914Z 2025-12-04T10:11:57.4486204Z [W1204 09:29:25.642237786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4486207Z 2025-12-04T10:11:57.4486498Z [W1204 09:29:25.642413609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4486504Z 2025-12-04T10:11:57.4486794Z [W1204 09:29:25.642563532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4486798Z 2025-12-04T10:11:57.4487101Z [W1204 09:29:25.642767225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4487105Z 2025-12-04T10:11:57.4487400Z [W1204 09:29:25.642899597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4487403Z 2025-12-04T10:11:57.4487697Z [W1204 09:29:25.643117051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4487700Z 2025-12-04T10:11:57.4487994Z [W1204 09:29:25.643241983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4488000Z 2025-12-04T10:11:57.4488296Z [W1204 09:29:25.643449267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4488299Z 2025-12-04T10:11:57.4488587Z [W1204 09:29:25.643573179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4488590Z 2025-12-04T10:11:57.4488880Z [W1204 09:29:25.727352767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4488888Z 2025-12-04T10:11:57.4489249Z [W1204 09:29:25.727573291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4489253Z 2025-12-04T10:11:57.4489544Z [W1204 09:29:25.727725214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4489547Z 2025-12-04T10:11:57.4489907Z [W1204 09:29:25.727931568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4489910Z 2025-12-04T10:11:57.4490198Z [W1204 09:29:25.728059870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4490201Z 2025-12-04T10:11:57.4490495Z [W1204 09:29:25.728277493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4490499Z 2025-12-04T10:11:57.4490789Z [W1204 09:29:25.728412496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4490795Z 2025-12-04T10:11:57.4491090Z [W1204 09:29:25.728624319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4491093Z 2025-12-04T10:11:57.4491384Z [W1204 09:29:25.728746041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4491390Z 2025-12-04T10:11:57.4491471Z ('RERUN', {'yellow': True}) [0.5660s] [100%] 2025-12-04T10:11:57.4492216Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:26.198356901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4492220Z 2025-12-04T10:11:57.4492513Z [W1204 09:29:26.198567205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4492516Z 2025-12-04T10:11:57.4492807Z [W1204 09:29:26.198713397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4492810Z 2025-12-04T10:11:57.4493102Z [W1204 09:29:26.198922761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4493105Z 2025-12-04T10:11:57.4493396Z [W1204 09:29:26.199046873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4493399Z 2025-12-04T10:11:57.4493690Z [W1204 09:29:26.199264777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4493694Z 2025-12-04T10:11:57.4493989Z [W1204 09:29:26.199387319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4493995Z 2025-12-04T10:11:57.4494284Z [W1204 09:29:26.199592792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4494288Z 2025-12-04T10:11:57.4494581Z [W1204 09:29:26.199714855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4494587Z 2025-12-04T10:11:57.4494880Z [W1204 09:29:26.205807448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4494884Z 2025-12-04T10:11:57.4495174Z [W1204 09:29:26.205981942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4495181Z 2025-12-04T10:11:57.4495471Z [W1204 09:29:26.206133734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4495474Z 2025-12-04T10:11:57.4495911Z [W1204 09:29:26.206334868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4495915Z 2025-12-04T10:11:57.4496211Z [W1204 09:29:26.206462330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4496283Z 2025-12-04T10:11:57.4496573Z [W1204 09:29:26.206675963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4496576Z 2025-12-04T10:11:57.4496872Z [W1204 09:29:26.206800385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4496875Z 2025-12-04T10:11:57.4497168Z [W1204 09:29:26.207004609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4497171Z 2025-12-04T10:11:57.4497466Z [W1204 09:29:26.207127091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4497469Z 2025-12-04T10:11:57.4497756Z [W1204 09:29:26.290621435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4497762Z 2025-12-04T10:11:57.4498055Z [W1204 09:29:26.290824858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4498058Z 2025-12-04T10:11:57.4498347Z [W1204 09:29:26.290972741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4498350Z 2025-12-04T10:11:57.4498641Z [W1204 09:29:26.291184785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4498647Z 2025-12-04T10:11:57.4498937Z [W1204 09:29:26.291310757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4498940Z 2025-12-04T10:11:57.4499239Z [W1204 09:29:26.291525921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4499245Z 2025-12-04T10:11:57.4499546Z [W1204 09:29:26.291654513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4499550Z 2025-12-04T10:11:57.4499838Z [W1204 09:29:26.291854716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4499842Z 2025-12-04T10:11:57.4500136Z [W1204 09:29:26.291979238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4500139Z 2025-12-04T10:11:57.4500204Z FAILED [0.5608s] [100%] 2025-12-04T10:11:57.4500207Z 2025-12-04T10:11:57.4500307Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4500628Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4500707Z Traceback (most recent call last): 2025-12-04T10:11:57.4501023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4501092Z method(*args, **kwargs) 2025-12-04T10:11:57.4501391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4501458Z method(*args, **kwargs) 2025-12-04T10:11:57.4501745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4501810Z with policy(): 2025-12-04T10:11:57.4502195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4502263Z raise RuntimeError(msg) 2025-12-04T10:11:57.4503089Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4503157Z 2025-12-04T10:11:57.4503290Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4503851Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4503855Z 2025-12-04T10:11:57.4504027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4504161Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4504266Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4504626Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4504765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4504827Z graph_break [] 2025-12-04T10:11:57.4504952Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4505656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4505728Z if out == self.unknown_value: 2025-12-04T10:11:57.4506048Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4506123Z Traceback (most recent call last): 2025-12-04T10:11:57.4506429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4506503Z method(*args, **kwargs) 2025-12-04T10:11:57.4506797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4506865Z method(*args, **kwargs) 2025-12-04T10:11:57.4507156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4507216Z with policy(): 2025-12-04T10:11:57.4507513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4507594Z raise RuntimeError(msg) 2025-12-04T10:11:57.4508434Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4508445Z 2025-12-04T10:11:57.4508573Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4509112Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4509115Z 2025-12-04T10:11:57.4509282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4509408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4509599Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4509953Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4510083Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4510211Z graph_break [] 2025-12-04T10:11:57.4510338Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4511035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4511112Z if out == self.unknown_value: 2025-12-04T10:11:57.4511234Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4511333Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4511461Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4511805Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4511871Z graph_break [] 2025-12-04T10:11:57.4511954Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4512269Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4512346Z Traceback (most recent call last): 2025-12-04T10:11:57.4512644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4512713Z method(*args, **kwargs) 2025-12-04T10:11:57.4513019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4513087Z method(*args, **kwargs) 2025-12-04T10:11:57.4513384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4513445Z with policy(): 2025-12-04T10:11:57.4513750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4513816Z raise RuntimeError(msg) 2025-12-04T10:11:57.4514650Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4514657Z 2025-12-04T10:11:57.4514786Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4515329Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4515333Z 2025-12-04T10:11:57.4515498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4515628Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4515730Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4516076Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4516204Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4516268Z graph_break [] 2025-12-04T10:11:57.4516393Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4517338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4517422Z if out == self.unknown_value: 2025-12-04T10:11:57.4517638Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4517739Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4517868Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4518218Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4518283Z graph_break [] 2025-12-04T10:11:57.4518415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4518517Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4518644Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4518990Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4519057Z graph_break [] 2025-12-04T10:11:57.4519549Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml - 2025-12-04T10:11:57.4519650Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4521072Z FAILED [0.5608s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4521078Z 2025-12-04T10:11:57.4521205Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4521750Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4521754Z 2025-12-04T10:11:57.4521912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4522024Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4522140Z ================== 1 failed, 57 deselected, 2 rerun in 12.34s ================== 2025-12-04T10:11:57.4522199Z Got exit code 1 2025-12-04T10:11:57.4522270Z Retrying single test... 2025-12-04T10:11:57.4522540Z W1204 09:29:32.938000 36187 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4522927Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml 2025-12-04T10:11:57.4523026Z ============================= test session starts ============================== 2025-12-04T10:11:57.4523242Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4523316Z cachedir: .pytest_cache 2025-12-04T10:11:57.4523624Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4523704Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4523771Z configfile: pytest.ini 2025-12-04T10:11:57.4524160Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4524297Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4524881Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4525018Z Running 1 items in this shard 2025-12-04T10:11:57.4525026Z 2025-12-04T10:11:57.4525778Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:34.052767129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4525782Z 2025-12-04T10:11:57.4526087Z [W1204 09:29:43.242482605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4526091Z 2025-12-04T10:11:57.4526386Z [W1204 09:29:43.242745570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4526390Z 2025-12-04T10:11:57.4526685Z [W1204 09:29:43.243328300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4526691Z 2025-12-04T10:11:57.4526984Z [W1204 09:29:43.243514863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4526988Z 2025-12-04T10:11:57.4527289Z [W1204 09:29:43.244787256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4527293Z 2025-12-04T10:11:57.4527589Z [W1204 09:29:43.244984599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4527593Z 2025-12-04T10:11:57.4527885Z [W1204 09:29:43.245307075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4527888Z 2025-12-04T10:11:57.4528180Z [W1204 09:29:43.245458767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4528185Z 2025-12-04T10:11:57.4528471Z [W1204 09:29:43.253941066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4528474Z 2025-12-04T10:11:57.4528762Z [W1204 09:29:43.254161950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4528768Z 2025-12-04T10:11:57.4529056Z [W1204 09:29:43.254337853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4529059Z 2025-12-04T10:11:57.4529351Z [W1204 09:29:43.254584387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4529354Z 2025-12-04T10:11:57.4529645Z [W1204 09:29:43.254730609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4529651Z 2025-12-04T10:11:57.4529941Z [W1204 09:29:43.254970704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4529944Z 2025-12-04T10:11:57.4530238Z [W1204 09:29:43.255110466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4530241Z 2025-12-04T10:11:57.4530529Z [W1204 09:29:43.255341160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4530532Z 2025-12-04T10:11:57.4530894Z [W1204 09:29:43.255477882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4530898Z 2025-12-04T10:11:57.4531190Z [W1204 09:29:43.344144693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4531276Z 2025-12-04T10:11:57.4531570Z [W1204 09:29:43.344381757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4531573Z 2025-12-04T10:11:57.4531862Z [W1204 09:29:43.344535970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4531866Z 2025-12-04T10:11:57.4532155Z [W1204 09:29:43.344748773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4532164Z 2025-12-04T10:11:57.4532455Z [W1204 09:29:43.344879356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4532458Z 2025-12-04T10:11:57.4532745Z [W1204 09:29:43.345103300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4532751Z 2025-12-04T10:11:57.4533042Z [W1204 09:29:43.345230662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4533045Z 2025-12-04T10:11:57.4533333Z [W1204 09:29:43.345441426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4533337Z 2025-12-04T10:11:57.4533633Z [W1204 09:29:43.345567788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4533636Z 2025-12-04T10:11:57.4533722Z ('RERUN', {'yellow': True}) [11.2184s] [100%] 2025-12-04T10:11:57.4534472Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:44.583126734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4534478Z 2025-12-04T10:11:57.4534768Z [W1204 09:29:44.583388908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4534771Z 2025-12-04T10:11:57.4535074Z [W1204 09:29:44.583548861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4535077Z 2025-12-04T10:11:57.4535373Z [W1204 09:29:44.583761845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4535377Z 2025-12-04T10:11:57.4535667Z [W1204 09:29:44.583892267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4535671Z 2025-12-04T10:11:57.4535963Z [W1204 09:29:44.584113651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4535966Z 2025-12-04T10:11:57.4536258Z [W1204 09:29:44.584244483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4536261Z 2025-12-04T10:11:57.4536554Z [W1204 09:29:44.584460587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4536557Z 2025-12-04T10:11:57.4536847Z [W1204 09:29:44.584585529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4536850Z 2025-12-04T10:11:57.4537142Z [W1204 09:29:44.590795267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4537215Z 2025-12-04T10:11:57.4537504Z [W1204 09:29:44.590970360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4537507Z 2025-12-04T10:11:57.4537799Z [W1204 09:29:44.591120033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4537866Z 2025-12-04T10:11:57.4538155Z [W1204 09:29:44.591320916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4538159Z 2025-12-04T10:11:57.4538448Z [W1204 09:29:44.591444008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4538455Z 2025-12-04T10:11:57.4538742Z [W1204 09:29:44.591661602 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4538745Z 2025-12-04T10:11:57.4539035Z [W1204 09:29:44.591789954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4539039Z 2025-12-04T10:11:57.4539330Z [W1204 09:29:44.592000458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4539336Z 2025-12-04T10:11:57.4539623Z [W1204 09:29:44.592126380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4539626Z 2025-12-04T10:11:57.4539923Z [W1204 09:29:44.675820655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4539926Z 2025-12-04T10:11:57.4540213Z [W1204 09:29:44.676042209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4540216Z 2025-12-04T10:11:57.4540511Z [W1204 09:29:44.676195102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4540514Z 2025-12-04T10:11:57.4540806Z [W1204 09:29:44.676410555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4540812Z 2025-12-04T10:11:57.4541103Z [W1204 09:29:44.676532587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4541106Z 2025-12-04T10:11:57.4541394Z [W1204 09:29:44.676751701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4541397Z 2025-12-04T10:11:57.4541687Z [W1204 09:29:44.676875463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4541696Z 2025-12-04T10:11:57.4541986Z [W1204 09:29:44.677083917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4541989Z 2025-12-04T10:11:57.4542276Z [W1204 09:29:44.677205259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4542282Z 2025-12-04T10:11:57.4542367Z ('RERUN', {'yellow': True}) [0.5642s] [100%] 2025-12-04T10:11:57.4543104Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:45.145458829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4543108Z 2025-12-04T10:11:57.4543401Z [W1204 09:29:45.145676883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4543404Z 2025-12-04T10:11:57.4543762Z [W1204 09:29:45.145835185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4543766Z 2025-12-04T10:11:57.4544060Z [W1204 09:29:45.146048669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4544127Z 2025-12-04T10:11:57.4544414Z [W1204 09:29:45.146181322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4544417Z 2025-12-04T10:11:57.4544708Z [W1204 09:29:45.146401606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4544711Z 2025-12-04T10:11:57.4544999Z [W1204 09:29:45.146530008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4545002Z 2025-12-04T10:11:57.4545293Z [W1204 09:29:45.146738211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4545296Z 2025-12-04T10:11:57.4545588Z [W1204 09:29:45.146861434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4545594Z 2025-12-04T10:11:57.4545887Z [W1204 09:29:45.152931389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4545891Z 2025-12-04T10:11:57.4546180Z [W1204 09:29:45.153108692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4546183Z 2025-12-04T10:11:57.4546484Z [W1204 09:29:45.153261295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4546487Z 2025-12-04T10:11:57.4546785Z [W1204 09:29:45.153464249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4546788Z 2025-12-04T10:11:57.4547078Z [W1204 09:29:45.153586691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4547084Z 2025-12-04T10:11:57.4547378Z [W1204 09:29:45.153802215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4547381Z 2025-12-04T10:11:57.4547670Z [W1204 09:29:45.153926817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4547673Z 2025-12-04T10:11:57.4547960Z [W1204 09:29:45.154131021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4547968Z 2025-12-04T10:11:57.4548259Z [W1204 09:29:45.154252393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4548262Z 2025-12-04T10:11:57.4548551Z [W1204 09:29:45.237569960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4548554Z 2025-12-04T10:11:57.4548852Z [W1204 09:29:45.237759713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4548855Z 2025-12-04T10:11:57.4549145Z [W1204 09:29:45.237912626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4549148Z 2025-12-04T10:11:57.4549448Z [W1204 09:29:45.238120079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4549451Z 2025-12-04T10:11:57.4549810Z [W1204 09:29:45.238245092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4549813Z 2025-12-04T10:11:57.4550112Z [W1204 09:29:45.238459505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4550115Z 2025-12-04T10:11:57.4550405Z [W1204 09:29:45.238580998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4550472Z 2025-12-04T10:11:57.4550770Z [W1204 09:29:45.238781881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4550773Z 2025-12-04T10:11:57.4551063Z [W1204 09:29:45.238900733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4551066Z 2025-12-04T10:11:57.4551130Z FAILED [0.5590s] [100%] 2025-12-04T10:11:57.4551133Z 2025-12-04T10:11:57.4551225Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4551545Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4551628Z Traceback (most recent call last): 2025-12-04T10:11:57.4551943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4552014Z method(*args, **kwargs) 2025-12-04T10:11:57.4552320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4552386Z method(*args, **kwargs) 2025-12-04T10:11:57.4552679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4552746Z with policy(): 2025-12-04T10:11:57.4553041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4553114Z raise RuntimeError(msg) 2025-12-04T10:11:57.4553940Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4553947Z 2025-12-04T10:11:57.4554083Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4554635Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4554638Z 2025-12-04T10:11:57.4554807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4554943Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4555045Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4555409Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4555550Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4555613Z graph_break [] 2025-12-04T10:11:57.4555745Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4556446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4556526Z if out == self.unknown_value: 2025-12-04T10:11:57.4556911Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4556989Z Traceback (most recent call last): 2025-12-04T10:11:57.4557303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4557369Z method(*args, **kwargs) 2025-12-04T10:11:57.4557752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4557822Z method(*args, **kwargs) 2025-12-04T10:11:57.4558121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4558186Z with policy(): 2025-12-04T10:11:57.4558485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4558552Z raise RuntimeError(msg) 2025-12-04T10:11:57.4559396Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4559402Z 2025-12-04T10:11:57.4559532Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4560129Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4560133Z 2025-12-04T10:11:57.4560295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4560424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4560536Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4560895Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4561031Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4561091Z graph_break [] 2025-12-04T10:11:57.4561220Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4561917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4561990Z if out == self.unknown_value: 2025-12-04T10:11:57.4562123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4562219Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4562347Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4562703Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4562764Z graph_break [] 2025-12-04T10:11:57.4562848Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4563167Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4563242Z Traceback (most recent call last): 2025-12-04T10:11:57.4563552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4563619Z method(*args, **kwargs) 2025-12-04T10:11:57.4563917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4563988Z method(*args, **kwargs) 2025-12-04T10:11:57.4564353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4564422Z with policy(): 2025-12-04T10:11:57.4564721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4564854Z raise RuntimeError(msg) 2025-12-04T10:11:57.4565692Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4565696Z 2025-12-04T10:11:57.4565825Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4566369Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4566373Z 2025-12-04T10:11:57.4566533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4566672Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4566783Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4567134Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4567265Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4567325Z graph_break [] 2025-12-04T10:11:57.4567449Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4568149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4568223Z if out == self.unknown_value: 2025-12-04T10:11:57.4568351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4568451Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4568576Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4568933Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4568993Z graph_break [] 2025-12-04T10:11:57.4569120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4569221Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:11:57.4569344Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4569695Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:11:57.4569755Z graph_break [] 2025-12-04T10:11:57.4570245Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml - 2025-12-04T10:11:57.4570359Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4571753Z FAILED [0.5590s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4571758Z 2025-12-04T10:11:57.4571892Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4572432Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4572500Z 2025-12-04T10:11:57.4572664Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4572772Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4572889Z ================== 1 failed, 57 deselected, 2 rerun in 12.37s ================== 2025-12-04T10:11:57.4572957Z Got exit code 1 2025-12-04T10:11:57.4573454Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4573710Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.4573978Z W1204 09:29:51.887000 36374 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4574371Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml 2025-12-04T10:11:57.4574477Z ============================= test session starts ============================== 2025-12-04T10:11:57.4574690Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4574764Z cachedir: .pytest_cache 2025-12-04T10:11:57.4575075Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4575154Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4575230Z configfile: pytest.ini 2025-12-04T10:11:57.4575547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4575678Z collecting ... collected 58 items / 4 deselected / 54 selected 2025-12-04T10:11:57.4575788Z stepcurrent: skipping 4 already run items. 2025-12-04T10:11:57.4575861Z Running 54 items in this shard 2025-12-04T10:11:57.4575864Z 2025-12-04T10:11:57.4576379Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0557s] [ 1%] 2025-12-04T10:11:57.4576872Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6275s] [ 1%] 2025-12-04T10:11:57.4577331Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6189s] [ 1%] 2025-12-04T10:11:57.4577342Z 2025-12-04T10:11:57.4577426Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4577733Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4577814Z Traceback (most recent call last): 2025-12-04T10:11:57.4578122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4578192Z method(*args, **kwargs) 2025-12-04T10:11:57.4578492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4578557Z method(*args, **kwargs) 2025-12-04T10:11:57.4578923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4578986Z with policy(): 2025-12-04T10:11:57.4579285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4579427Z raise RuntimeError(msg) 2025-12-04T10:11:57.4580248Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4580252Z 2025-12-04T10:11:57.4580387Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4580913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4580917Z 2025-12-04T10:11:57.4581076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4581209Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4581309Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4581667Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4581795Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4581854Z graph_break [] 2025-12-04T10:11:57.4582156Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4582233Z Traceback (most recent call last): 2025-12-04T10:11:57.4582540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4582607Z method(*args, **kwargs) 2025-12-04T10:11:57.4582897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4582970Z method(*args, **kwargs) 2025-12-04T10:11:57.4583260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4583320Z with policy(): 2025-12-04T10:11:57.4583622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4583689Z raise RuntimeError(msg) 2025-12-04T10:11:57.4584534Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4584537Z 2025-12-04T10:11:57.4584664Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4585194Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4585200Z 2025-12-04T10:11:57.4585358Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4585488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4585599Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4585950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4586170Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4586232Z graph_break [] 2025-12-04T10:11:57.4586358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4586455Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4586648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4586995Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4587061Z graph_break [] 2025-12-04T10:11:57.4587147Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4587449Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4587523Z Traceback (most recent call last): 2025-12-04T10:11:57.4587826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4587910Z method(*args, **kwargs) 2025-12-04T10:11:57.4588206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4588274Z method(*args, **kwargs) 2025-12-04T10:11:57.4588571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4588632Z with policy(): 2025-12-04T10:11:57.4588934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4589004Z raise RuntimeError(msg) 2025-12-04T10:11:57.4589838Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4589843Z 2025-12-04T10:11:57.4589976Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4590501Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4590508Z 2025-12-04T10:11:57.4590672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4590799Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4590890Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4591242Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4591371Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4591444Z graph_break [] 2025-12-04T10:11:57.4591576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4591670Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4591803Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4592147Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4592216Z graph_break [] 2025-12-04T10:11:57.4592343Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4592440Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4592569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4592987Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4593050Z graph_break [] 2025-12-04T10:11:57.4593549Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml - 2025-12-04T10:11:57.4593717Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4595039Z FAILED [0.6189s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4595046Z 2025-12-04T10:11:57.4595174Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4595709Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4595715Z 2025-12-04T10:11:57.4595876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4595982Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4596118Z =================== 1 failed, 4 deselected, 2 rerun in 3.33s =================== 2025-12-04T10:11:57.4596181Z Got exit code 1 2025-12-04T10:11:57.4596254Z Retrying single test... 2025-12-04T10:11:57.4596519Z W1204 09:30:01.748000 36563 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4596911Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml 2025-12-04T10:11:57.4597015Z ============================= test session starts ============================== 2025-12-04T10:11:57.4597227Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4597300Z cachedir: .pytest_cache 2025-12-04T10:11:57.4597611Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4597688Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4597758Z configfile: pytest.ini 2025-12-04T10:11:57.4598076Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4598207Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4598790Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4598863Z Running 1 items in this shard 2025-12-04T10:11:57.4598867Z 2025-12-04T10:11:57.4599614Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:03.182918143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4599618Z 2025-12-04T10:11:57.4599962Z [W1204 09:30:12.473511348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4599966Z 2025-12-04T10:11:57.4600270Z [W1204 09:30:12.473769112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4600274Z 2025-12-04T10:11:57.4600640Z [W1204 09:30:12.479933278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4600643Z 2025-12-04T10:11:57.4600939Z [W1204 09:30:12.480570129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4601082Z 2025-12-04T10:11:57.4601374Z [W1204 09:30:12.480762912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4601377Z 2025-12-04T10:11:57.4601664Z [W1204 09:30:12.486267086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4601668Z 2025-12-04T10:11:57.4601961Z [W1204 09:30:12.486829705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4601965Z 2025-12-04T10:11:57.4602257Z [W1204 09:30:12.487008978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4602260Z 2025-12-04T10:11:57.4602348Z ('RERUN', {'yellow': True}) [11.3528s] [100%] 2025-12-04T10:11:57.4603095Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:13.665245296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4603102Z 2025-12-04T10:11:57.4603402Z [W1204 09:30:13.665786476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4603405Z 2025-12-04T10:11:57.4603696Z [W1204 09:30:13.665931688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4603699Z 2025-12-04T10:11:57.4603996Z [W1204 09:30:13.668838368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4603999Z 2025-12-04T10:11:57.4604293Z [W1204 09:30:13.669401907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4604299Z 2025-12-04T10:11:57.4604589Z [W1204 09:30:13.669543160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4604597Z 2025-12-04T10:11:57.4604884Z [W1204 09:30:13.674047227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4604887Z 2025-12-04T10:11:57.4605174Z [W1204 09:30:13.674512786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4605177Z 2025-12-04T10:11:57.4605474Z [W1204 09:30:13.674649388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4605477Z 2025-12-04T10:11:57.4605558Z ('RERUN', {'yellow': True}) [0.5986s] [100%] 2025-12-04T10:11:57.4606295Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:14.257175233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4606301Z 2025-12-04T10:11:57.4606591Z [W1204 09:30:14.257719193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4606594Z 2025-12-04T10:11:57.4606887Z [W1204 09:30:14.257863675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4606890Z 2025-12-04T10:11:57.4607247Z [W1204 09:30:14.260799495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4607251Z 2025-12-04T10:11:57.4607548Z [W1204 09:30:14.261363075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4607623Z 2025-12-04T10:11:57.4607918Z [W1204 09:30:14.261502938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4607922Z 2025-12-04T10:11:57.4608211Z [W1204 09:30:14.265992665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4608220Z 2025-12-04T10:11:57.4608510Z [W1204 09:30:14.266453993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4608513Z 2025-12-04T10:11:57.4608806Z [W1204 09:30:14.266590625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4608809Z 2025-12-04T10:11:57.4608880Z FAILED [0.5960s] [100%] 2025-12-04T10:11:57.4608884Z 2025-12-04T10:11:57.4608969Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4609277Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4609356Z Traceback (most recent call last): 2025-12-04T10:11:57.4609664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4609740Z method(*args, **kwargs) 2025-12-04T10:11:57.4610040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4610106Z method(*args, **kwargs) 2025-12-04T10:11:57.4610410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4610472Z with policy(): 2025-12-04T10:11:57.4610774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4610844Z raise RuntimeError(msg) 2025-12-04T10:11:57.4611661Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4611670Z 2025-12-04T10:11:57.4611800Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4612329Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4612333Z 2025-12-04T10:11:57.4612499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4612628Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4612734Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4613094Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4613226Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4613294Z graph_break [] 2025-12-04T10:11:57.4613420Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4614217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4614301Z if out == self.unknown_value: 2025-12-04T10:11:57.4614601Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4614749Z Traceback (most recent call last): 2025-12-04T10:11:57.4615055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4615121Z method(*args, **kwargs) 2025-12-04T10:11:57.4615424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4615490Z method(*args, **kwargs) 2025-12-04T10:11:57.4615790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4615850Z with policy(): 2025-12-04T10:11:57.4616152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4616225Z raise RuntimeError(msg) 2025-12-04T10:11:57.4617191Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4617198Z 2025-12-04T10:11:57.4617335Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4617864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4617868Z 2025-12-04T10:11:57.4618031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4618179Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4618277Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4618636Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4618765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4618825Z graph_break [] 2025-12-04T10:11:57.4618957Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4619648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4619726Z if out == self.unknown_value: 2025-12-04T10:11:57.4619856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4619951Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4620086Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4620432Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4620493Z graph_break [] 2025-12-04T10:11:57.4620586Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4620885Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4620968Z Traceback (most recent call last): 2025-12-04T10:11:57.4621265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4621438Z method(*args, **kwargs) 2025-12-04T10:11:57.4621739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4621805Z method(*args, **kwargs) 2025-12-04T10:11:57.4622095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4622252Z with policy(): 2025-12-04T10:11:57.4622548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4622622Z raise RuntimeError(msg) 2025-12-04T10:11:57.4623450Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4623456Z 2025-12-04T10:11:57.4623587Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4624112Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4624118Z 2025-12-04T10:11:57.4624280Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4624412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4624502Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4624852Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4624979Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4625041Z graph_break [] 2025-12-04T10:11:57.4625173Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4625865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4625941Z if out == self.unknown_value: 2025-12-04T10:11:57.4626069Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4626161Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4626289Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4626630Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4626689Z graph_break [] 2025-12-04T10:11:57.4626824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4626914Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4627035Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4627381Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4627442Z graph_break [] 2025-12-04T10:11:57.4627935Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml - 2025-12-04T10:11:57.4628037Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4629434Z FAILED [0.5960s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4629532Z 2025-12-04T10:11:57.4629663Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4630193Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4630197Z 2025-12-04T10:11:57.4630356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4630462Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4630584Z ================== 1 failed, 57 deselected, 2 rerun in 12.57s ================== 2025-12-04T10:11:57.4630643Z Got exit code 1 2025-12-04T10:11:57.4630709Z Retrying single test... 2025-12-04T10:11:57.4630975Z W1204 09:30:20.920000 36757 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4631367Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml 2025-12-04T10:11:57.4631469Z ============================= test session starts ============================== 2025-12-04T10:11:57.4631680Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4631752Z cachedir: .pytest_cache 2025-12-04T10:11:57.4632063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4632142Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4632209Z configfile: pytest.ini 2025-12-04T10:11:57.4632532Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4632661Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4633244Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4633316Z Running 1 items in this shard 2025-12-04T10:11:57.4633319Z 2025-12-04T10:11:57.4634067Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:22.416789240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4634071Z 2025-12-04T10:11:57.4634373Z [W1204 09:30:31.402910727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4634377Z 2025-12-04T10:11:57.4634675Z [W1204 09:30:31.403153742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4634686Z 2025-12-04T10:11:57.4634978Z [W1204 09:30:31.408909661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4634982Z 2025-12-04T10:11:57.4635274Z [W1204 09:30:31.409459280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4635277Z 2025-12-04T10:11:57.4635586Z [W1204 09:30:31.409627403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4635590Z 2025-12-04T10:11:57.4635954Z [W1204 09:30:31.415148647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4635958Z 2025-12-04T10:11:57.4636256Z [W1204 09:30:31.415691286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4636342Z 2025-12-04T10:11:57.4636632Z [W1204 09:30:31.415870100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4636635Z 2025-12-04T10:11:57.4636724Z ('RERUN', {'yellow': True}) [11.1094s] [100%] 2025-12-04T10:11:57.4637458Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:32.592067768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4637461Z 2025-12-04T10:11:57.4637777Z [W1204 09:30:32.592610047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4637780Z 2025-12-04T10:11:57.4638078Z [W1204 09:30:32.592775580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4638084Z 2025-12-04T10:11:57.4638375Z [W1204 09:30:32.595692400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4638384Z 2025-12-04T10:11:57.4638675Z [W1204 09:30:32.596252009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4638678Z 2025-12-04T10:11:57.4650325Z [W1204 09:30:32.596410352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4650337Z 2025-12-04T10:11:57.4650694Z [W1204 09:30:32.600974790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4650698Z 2025-12-04T10:11:57.4651018Z [W1204 09:30:32.601433798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4651025Z 2025-12-04T10:11:57.4651317Z [W1204 09:30:32.601574490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4651321Z 2025-12-04T10:11:57.4651416Z ('RERUN', {'yellow': True}) [0.5973s] [100%] 2025-12-04T10:11:57.4652186Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:33.188362203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4652190Z 2025-12-04T10:11:57.4652505Z [W1204 09:30:33.188898852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4652512Z 2025-12-04T10:11:57.4652804Z [W1204 09:30:33.189043585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4652810Z 2025-12-04T10:11:57.4653096Z [W1204 09:30:33.191974845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4653100Z 2025-12-04T10:11:57.4653393Z [W1204 09:30:33.192539584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4653396Z 2025-12-04T10:11:57.4653681Z [W1204 09:30:33.192676867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4653684Z 2025-12-04T10:11:57.4654095Z [W1204 09:30:33.197160983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4654099Z 2025-12-04T10:11:57.4654395Z [W1204 09:30:33.197614671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4654464Z 2025-12-04T10:11:57.4654759Z [W1204 09:30:33.197750633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4654762Z 2025-12-04T10:11:57.4654827Z FAILED [0.5972s] [100%] 2025-12-04T10:11:57.4654830Z 2025-12-04T10:11:57.4654927Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4655234Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4655311Z Traceback (most recent call last): 2025-12-04T10:11:57.4655637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4655708Z method(*args, **kwargs) 2025-12-04T10:11:57.4656008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4656074Z method(*args, **kwargs) 2025-12-04T10:11:57.4656365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4656430Z with policy(): 2025-12-04T10:11:57.4656729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4656796Z raise RuntimeError(msg) 2025-12-04T10:11:57.4657635Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4657642Z 2025-12-04T10:11:57.4657784Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4658326Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4658332Z 2025-12-04T10:11:57.4658496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4658635Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4658739Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4659100Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4659244Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4659308Z graph_break [] 2025-12-04T10:11:57.4659441Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4660149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4660224Z if out == self.unknown_value: 2025-12-04T10:11:57.4660537Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4660612Z Traceback (most recent call last): 2025-12-04T10:11:57.4660918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4660990Z method(*args, **kwargs) 2025-12-04T10:11:57.4661351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4661415Z method(*args, **kwargs) 2025-12-04T10:11:57.4661704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4662133Z with policy(): 2025-12-04T10:11:57.4662444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4662511Z raise RuntimeError(msg) 2025-12-04T10:11:57.4663350Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4663357Z 2025-12-04T10:11:57.4663488Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4664016Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4664022Z 2025-12-04T10:11:57.4664188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4664319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4664419Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4664771Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4664899Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4664962Z graph_break [] 2025-12-04T10:11:57.4665087Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4665783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4665861Z if out == self.unknown_value: 2025-12-04T10:11:57.4665985Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4666079Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4666202Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4666543Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4666608Z graph_break [] 2025-12-04T10:11:57.4666690Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4666995Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4667067Z Traceback (most recent call last): 2025-12-04T10:11:57.4667377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4667446Z method(*args, **kwargs) 2025-12-04T10:11:57.4667736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4667798Z method(*args, **kwargs) 2025-12-04T10:11:57.4668095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4668155Z with policy(): 2025-12-04T10:11:57.4668447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4668513Z raise RuntimeError(msg) 2025-12-04T10:11:57.4669417Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4669487Z 2025-12-04T10:11:57.4669624Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4670148Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4670152Z 2025-12-04T10:11:57.4670316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4670440Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4670533Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4670882Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4671005Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4671068Z graph_break [] 2025-12-04T10:11:57.4671200Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4671894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4671966Z if out == self.unknown_value: 2025-12-04T10:11:57.4672087Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4672183Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4672305Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4672645Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4672711Z graph_break [] 2025-12-04T10:11:57.4672831Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4672917Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4673039Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4673446Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4673556Z graph_break [] 2025-12-04T10:11:57.4674150Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml - 2025-12-04T10:11:57.4674253Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4675574Z FAILED [0.5972s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4675583Z 2025-12-04T10:11:57.4675710Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4676325Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4676330Z 2025-12-04T10:11:57.4676498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4676610Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4676821Z ================== 1 failed, 57 deselected, 2 rerun in 12.33s ================== 2025-12-04T10:11:57.4676886Z Got exit code 1 2025-12-04T10:11:57.4677375Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4677624Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.4677891Z W1204 09:30:39.873000 36951 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4678282Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml 2025-12-04T10:11:57.4678393Z ============================= test session starts ============================== 2025-12-04T10:11:57.4678606Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4678681Z cachedir: .pytest_cache 2025-12-04T10:11:57.4678990Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4679068Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4679138Z configfile: pytest.ini 2025-12-04T10:11:57.4679453Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4679582Z collecting ... collected 58 items / 5 deselected / 53 selected 2025-12-04T10:11:57.4679676Z stepcurrent: skipping 5 already run items. 2025-12-04T10:11:57.4679747Z Running 53 items in this shard 2025-12-04T10:11:57.4679750Z 2025-12-04T10:11:57.4680350Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9428s] [ 1%] 2025-12-04T10:11:57.4680845Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5280s] [ 1%] 2025-12-04T10:11:57.4681287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.5350s] [ 1%] 2025-12-04T10:11:57.4681296Z 2025-12-04T10:11:57.4681381Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4681679Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4681755Z Traceback (most recent call last): 2025-12-04T10:11:57.4682064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4682133Z method(*args, **kwargs) 2025-12-04T10:11:57.4682435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4682499Z method(*args, **kwargs) 2025-12-04T10:11:57.4682794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4682853Z with policy(): 2025-12-04T10:11:57.4683151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4683222Z raise RuntimeError(msg) 2025-12-04T10:11:57.4684110Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4684182Z 2025-12-04T10:11:57.4684319Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4684843Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4684847Z 2025-12-04T10:11:57.4685007Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4685139Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4685235Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4685789Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4685929Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4685994Z graph_break [] 2025-12-04T10:11:57.4686288Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4686364Z Traceback (most recent call last): 2025-12-04T10:11:57.4686669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4686731Z method(*args, **kwargs) 2025-12-04T10:11:57.4687022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4687088Z method(*args, **kwargs) 2025-12-04T10:11:57.4687383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4687443Z with policy(): 2025-12-04T10:11:57.4687742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4687819Z raise RuntimeError(msg) 2025-12-04T10:11:57.4688648Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4688652Z 2025-12-04T10:11:57.4688781Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4689305Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4689309Z 2025-12-04T10:11:57.4689465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4689591Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4689690Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4690233Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4690361Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4690419Z graph_break [] 2025-12-04T10:11:57.4690541Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4690636Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4690831Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4691374Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4691519Z graph_break [] 2025-12-04T10:11:57.4691601Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4691893Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4691966Z Traceback (most recent call last): 2025-12-04T10:11:57.4692267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4692333Z method(*args, **kwargs) 2025-12-04T10:11:57.4692624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4692690Z method(*args, **kwargs) 2025-12-04T10:11:57.4692977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4693038Z with policy(): 2025-12-04T10:11:57.4693337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4693400Z raise RuntimeError(msg) 2025-12-04T10:11:57.4694217Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4694224Z 2025-12-04T10:11:57.4694350Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4694872Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4694877Z 2025-12-04T10:11:57.4695040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4695164Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4695256Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4695793Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4695917Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4695979Z graph_break [] 2025-12-04T10:11:57.4696102Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4696189Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4696314Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4696855Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4696917Z graph_break [] 2025-12-04T10:11:57.4697041Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4697131Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4697257Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4697864Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4697927Z graph_break [] 2025-12-04T10:11:57.4698425Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml - 2025-12-04T10:11:57.4698592Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4699884Z FAILED [0.5350s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4699892Z 2025-12-04T10:11:57.4700017Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4700545Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4700551Z 2025-12-04T10:11:57.4700708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4700818Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4700936Z =================== 1 failed, 5 deselected, 2 rerun in 3.03s =================== 2025-12-04T10:11:57.4700994Z Got exit code 1 2025-12-04T10:11:57.4701064Z Retrying single test... 2025-12-04T10:11:57.4701327Z W1204 09:30:49.535000 37140 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4701720Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml 2025-12-04T10:11:57.4701814Z ============================= test session starts ============================== 2025-12-04T10:11:57.4702027Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4702102Z cachedir: .pytest_cache 2025-12-04T10:11:57.4702409Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4702495Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4702566Z configfile: pytest.ini 2025-12-04T10:11:57.4702884Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4703018Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4703592Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4703663Z Running 1 items in this shard 2025-12-04T10:11:57.4703666Z 2025-12-04T10:11:57.4704405Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:30:51.133610020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4704409Z 2025-12-04T10:11:57.4704706Z [W1204 09:31:00.111308444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4704710Z 2025-12-04T10:11:57.4705004Z [W1204 09:31:00.111578758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4705007Z 2025-12-04T10:11:57.4705385Z [W1204 09:31:00.117829945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4705389Z 2025-12-04T10:11:57.4705682Z [W1204 09:31:00.118433966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4705751Z 2025-12-04T10:11:57.4706041Z [W1204 09:31:00.118617479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4706045Z 2025-12-04T10:11:57.4706340Z [W1204 09:31:00.124084572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4706343Z 2025-12-04T10:11:57.4706631Z [W1204 09:31:00.124640912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4706635Z 2025-12-04T10:11:57.4706929Z [W1204 09:31:00.124797894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4706932Z 2025-12-04T10:11:57.4707014Z ('RERUN', {'yellow': True}) [10.9282s] [100%] 2025-12-04T10:11:57.4707739Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:00.924744509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4707746Z 2025-12-04T10:11:57.4708041Z [W1204 09:31:00.925259168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4708044Z 2025-12-04T10:11:57.4708330Z [W1204 09:31:00.925399560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4708333Z 2025-12-04T10:11:57.4708636Z [W1204 09:31:00.928243399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4708639Z 2025-12-04T10:11:57.4708930Z [W1204 09:31:00.928703027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4708935Z 2025-12-04T10:11:57.4709227Z [W1204 09:31:00.928841009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4709230Z 2025-12-04T10:11:57.4709515Z [W1204 09:31:01.933330455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4709519Z 2025-12-04T10:11:57.4709809Z [W1204 09:31:01.933792854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4709813Z 2025-12-04T10:11:57.4710106Z [W1204 09:31:01.933928416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4710109Z 2025-12-04T10:11:57.4710189Z ('RERUN', {'yellow': True}) [0.4998s] [100%] 2025-12-04T10:11:57.4710925Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:01.423124474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4710931Z 2025-12-04T10:11:57.4711226Z [W1204 09:31:01.423626882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4711230Z 2025-12-04T10:11:57.4711522Z [W1204 09:31:01.423766285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4711525Z 2025-12-04T10:11:57.4711883Z [W1204 09:31:01.426642214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4711886Z 2025-12-04T10:11:57.4712179Z [W1204 09:31:01.427083721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4712246Z 2025-12-04T10:11:57.4712538Z [W1204 09:31:01.427221134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4712541Z 2025-12-04T10:11:57.4712831Z [W1204 09:31:01.431794032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4712834Z 2025-12-04T10:11:57.4713124Z [W1204 09:31:01.432248700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4713127Z 2025-12-04T10:11:57.4713422Z [W1204 09:31:01.432396892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4713425Z 2025-12-04T10:11:57.4713486Z FAILED [0.4965s] [100%] 2025-12-04T10:11:57.4713489Z 2025-12-04T10:11:57.4713572Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4713873Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4713947Z Traceback (most recent call last): 2025-12-04T10:11:57.4714260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4714329Z method(*args, **kwargs) 2025-12-04T10:11:57.4714623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4714692Z method(*args, **kwargs) 2025-12-04T10:11:57.4715001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4715062Z with policy(): 2025-12-04T10:11:57.4715362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4715430Z raise RuntimeError(msg) 2025-12-04T10:11:57.4716235Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4716239Z 2025-12-04T10:11:57.4716366Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4716894Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4716902Z 2025-12-04T10:11:57.4717269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4717399Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4717500Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4718049Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4718184Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4718250Z graph_break [] 2025-12-04T10:11:57.4718376Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4719209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4719284Z if out == self.unknown_value: 2025-12-04T10:11:57.4719577Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4719749Z Traceback (most recent call last): 2025-12-04T10:11:57.4720088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4720159Z method(*args, **kwargs) 2025-12-04T10:11:57.4720449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4720510Z method(*args, **kwargs) 2025-12-04T10:11:57.4720816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4720880Z with policy(): 2025-12-04T10:11:57.4721173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4721242Z raise RuntimeError(msg) 2025-12-04T10:11:57.4722063Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4722070Z 2025-12-04T10:11:57.4722200Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4722719Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4722723Z 2025-12-04T10:11:57.4722893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4723021Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4723114Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4723664Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4723789Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4723853Z graph_break [] 2025-12-04T10:11:57.4723973Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4724661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4724737Z if out == self.unknown_value: 2025-12-04T10:11:57.4724855Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4724948Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4725072Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4725606Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4725669Z graph_break [] 2025-12-04T10:11:57.4725761Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4726055Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4726209Z Traceback (most recent call last): 2025-12-04T10:11:57.4726509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4726576Z method(*args, **kwargs) 2025-12-04T10:11:57.4726869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4727074Z method(*args, **kwargs) 2025-12-04T10:11:57.4727366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4727424Z with policy(): 2025-12-04T10:11:57.4727721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4727788Z raise RuntimeError(msg) 2025-12-04T10:11:57.4728620Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4728624Z 2025-12-04T10:11:57.4728754Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4729273Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4729276Z 2025-12-04T10:11:57.4729438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4729560Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4729648Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4730197Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4730320Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4730381Z graph_break [] 2025-12-04T10:11:57.4730506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4731196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4731270Z if out == self.unknown_value: 2025-12-04T10:11:57.4731400Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4731498Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4731622Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4732160Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4732224Z graph_break [] 2025-12-04T10:11:57.4732344Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4732434Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4732559Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4733097Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4733159Z graph_break [] 2025-12-04T10:11:57.4733744Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml - 2025-12-04T10:11:57.4733847Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4735145Z FAILED [0.4965s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4735213Z 2025-12-04T10:11:57.4735339Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4735863Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4735866Z 2025-12-04T10:11:57.4736023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4736141Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4736261Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.4736319Z Got exit code 1 2025-12-04T10:11:57.4736385Z Retrying single test... 2025-12-04T10:11:57.4736652Z W1204 09:31:07.982000 37334 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4737043Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml 2025-12-04T10:11:57.4737139Z ============================= test session starts ============================== 2025-12-04T10:11:57.4737353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4737425Z cachedir: .pytest_cache 2025-12-04T10:11:57.4737741Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4737828Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4737892Z configfile: pytest.ini 2025-12-04T10:11:57.4738204Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4738336Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4738908Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4738979Z Running 1 items in this shard 2025-12-04T10:11:57.4738985Z 2025-12-04T10:11:57.4739721Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:09.558200458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4739726Z 2025-12-04T10:11:57.4740025Z [W1204 09:31:18.657581822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4740033Z 2025-12-04T10:11:57.4740322Z [W1204 09:31:18.657839777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4740326Z 2025-12-04T10:11:57.4740615Z [W1204 09:31:18.663645796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4740619Z 2025-12-04T10:11:57.4741021Z [W1204 09:31:18.664205235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4741025Z 2025-12-04T10:11:57.4741315Z [W1204 09:31:18.664387829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4741385Z 2025-12-04T10:11:57.4741686Z [W1204 09:31:18.669735400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4741688Z 2025-12-04T10:11:57.4741979Z [W1204 09:31:18.670276819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4741986Z 2025-12-04T10:11:57.4742276Z [W1204 09:31:18.670439282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4742280Z 2025-12-04T10:11:57.4742373Z ('RERUN', {'yellow': True}) [11.0287s] [100%] 2025-12-04T10:11:57.4743112Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:19.465494855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4743118Z 2025-12-04T10:11:57.4743411Z [W1204 09:31:19.466006143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4743414Z 2025-12-04T10:11:57.4743708Z [W1204 09:31:19.466148626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4743711Z 2025-12-04T10:11:57.4744003Z [W1204 09:31:19.468989754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4744006Z 2025-12-04T10:11:57.4744305Z [W1204 09:31:19.469440682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4744308Z 2025-12-04T10:11:57.4744598Z [W1204 09:31:19.469580324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4744604Z 2025-12-04T10:11:57.4744892Z [W1204 09:31:19.473995540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4744902Z 2025-12-04T10:11:57.4745196Z [W1204 09:31:19.474445718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4745199Z 2025-12-04T10:11:57.4745489Z [W1204 09:31:19.474584110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4745492Z 2025-12-04T10:11:57.4745579Z ('RERUN', {'yellow': True}) [0.4925s] [100%] 2025-12-04T10:11:57.4746309Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:20.956503298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4746315Z 2025-12-04T10:11:57.4746613Z [W1204 09:31:20.957010497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4746617Z 2025-12-04T10:11:57.4746910Z [W1204 09:31:20.957153199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4746914Z 2025-12-04T10:11:57.4747209Z [W1204 09:31:20.959995088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4747212Z 2025-12-04T10:11:57.4747566Z [W1204 09:31:20.960471716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4747570Z 2025-12-04T10:11:57.4747863Z [W1204 09:31:20.960619338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4747931Z 2025-12-04T10:11:57.4748221Z [W1204 09:31:20.964999903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4748224Z 2025-12-04T10:11:57.4748511Z [W1204 09:31:20.965450101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4748519Z 2025-12-04T10:11:57.4748806Z [W1204 09:31:20.965585543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4748809Z 2025-12-04T10:11:57.4748870Z FAILED [0.4865s] [100%] 2025-12-04T10:11:57.4748874Z 2025-12-04T10:11:57.4748962Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4749258Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4749337Z Traceback (most recent call last): 2025-12-04T10:11:57.4749647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4749711Z method(*args, **kwargs) 2025-12-04T10:11:57.4750018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4750080Z method(*args, **kwargs) 2025-12-04T10:11:57.4750385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4750456Z with policy(): 2025-12-04T10:11:57.4750761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4750837Z raise RuntimeError(msg) 2025-12-04T10:11:57.4751646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4751653Z 2025-12-04T10:11:57.4751784Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4752308Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4752312Z 2025-12-04T10:11:57.4752469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4752606Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4752702Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4753247Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4753384Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4753442Z graph_break [] 2025-12-04T10:11:57.4753572Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4754264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4754336Z if out == self.unknown_value: 2025-12-04T10:11:57.4754705Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4754782Z Traceback (most recent call last): 2025-12-04T10:11:57.4755086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4755242Z method(*args, **kwargs) 2025-12-04T10:11:57.4755538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4755606Z method(*args, **kwargs) 2025-12-04T10:11:57.4755902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4755962Z with policy(): 2025-12-04T10:11:57.4756261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4756327Z raise RuntimeError(msg) 2025-12-04T10:11:57.4757155Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4757162Z 2025-12-04T10:11:57.4757287Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4757815Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4757819Z 2025-12-04T10:11:57.4757974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4758103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4758204Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4758745Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4758881Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4758941Z graph_break [] 2025-12-04T10:11:57.4759063Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4759758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4759830Z if out == self.unknown_value: 2025-12-04T10:11:57.4760006Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4760100Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4760225Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4760774Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4760836Z graph_break [] 2025-12-04T10:11:57.4760930Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4761229Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4761301Z Traceback (most recent call last): 2025-12-04T10:11:57.4761603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4761671Z method(*args, **kwargs) 2025-12-04T10:11:57.4762036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4762111Z method(*args, **kwargs) 2025-12-04T10:11:57.4762404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4762536Z with policy(): 2025-12-04T10:11:57.4762838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4762906Z raise RuntimeError(msg) 2025-12-04T10:11:57.4763734Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4763737Z 2025-12-04T10:11:57.4763867Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4764394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4764400Z 2025-12-04T10:11:57.4764559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4764682Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4764781Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4765320Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4765453Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4765514Z graph_break [] 2025-12-04T10:11:57.4765637Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4766334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4766406Z if out == self.unknown_value: 2025-12-04T10:11:57.4766533Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4766622Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4766749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4767298Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4767357Z graph_break [] 2025-12-04T10:11:57.4767485Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4767588Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4767731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4768277Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4768335Z graph_break [] 2025-12-04T10:11:57.4768827Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml - 2025-12-04T10:11:57.4768939Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4770306Z FAILED [0.4865s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4770375Z 2025-12-04T10:11:57.4770516Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4771043Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4771046Z 2025-12-04T10:11:57.4771212Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4771322Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4771438Z ================== 1 failed, 57 deselected, 2 rerun in 12.03s ================== 2025-12-04T10:11:57.4771501Z Got exit code 1 2025-12-04T10:11:57.4771984Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4772253Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.4772524Z W1204 09:31:26.607000 37527 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4772915Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml 2025-12-04T10:11:57.4773019Z ============================= test session starts ============================== 2025-12-04T10:11:57.4773233Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4773307Z cachedir: .pytest_cache 2025-12-04T10:11:57.4773612Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4773694Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4773765Z configfile: pytest.ini 2025-12-04T10:11:57.4774085Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4774214Z collecting ... collected 58 items / 6 deselected / 52 selected 2025-12-04T10:11:57.4774305Z stepcurrent: skipping 6 already run items. 2025-12-04T10:11:57.4774378Z Running 52 items in this shard 2025-12-04T10:11:57.4774382Z 2025-12-04T10:11:57.4774889Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9379s] [ 1%] 2025-12-04T10:11:57.4775385Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5367s] [ 1%] 2025-12-04T10:11:57.4775844Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5346s] [ 1%] 2025-12-04T10:11:57.4775848Z 2025-12-04T10:11:57.4775930Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4776227Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4776308Z Traceback (most recent call last): 2025-12-04T10:11:57.4776687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4776760Z method(*args, **kwargs) 2025-12-04T10:11:57.4777052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4777180Z method(*args, **kwargs) 2025-12-04T10:11:57.4777476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4777537Z with policy(): 2025-12-04T10:11:57.4777834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4777904Z raise RuntimeError(msg) 2025-12-04T10:11:57.4778716Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4778720Z 2025-12-04T10:11:57.4778858Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4779380Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4779386Z 2025-12-04T10:11:57.4779552Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4779677Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4779774Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4780331Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4780463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4780527Z graph_break [] 2025-12-04T10:11:57.4780817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4780893Z Traceback (most recent call last): 2025-12-04T10:11:57.4781203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4781272Z method(*args, **kwargs) 2025-12-04T10:11:57.4781566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4781636Z method(*args, **kwargs) 2025-12-04T10:11:57.4781926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4781992Z with policy(): 2025-12-04T10:11:57.4782290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4782357Z raise RuntimeError(msg) 2025-12-04T10:11:57.4783186Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4783192Z 2025-12-04T10:11:57.4783316Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4783840Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4783844Z 2025-12-04T10:11:57.4784069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4784199Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4784297Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4784853Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4785070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4785130Z graph_break [] 2025-12-04T10:11:57.4785256Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4785351Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4785473Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4786016Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4786076Z graph_break [] 2025-12-04T10:11:57.4786161Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4786461Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4786536Z Traceback (most recent call last): 2025-12-04T10:11:57.4786838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4786907Z method(*args, **kwargs) 2025-12-04T10:11:57.4787200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4787278Z method(*args, **kwargs) 2025-12-04T10:11:57.4787575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4787637Z with policy(): 2025-12-04T10:11:57.4787938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4788012Z raise RuntimeError(msg) 2025-12-04T10:11:57.4788838Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4788842Z 2025-12-04T10:11:57.4788967Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4789491Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4789500Z 2025-12-04T10:11:57.4789656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4789779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4789878Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4790421Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4790550Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4790619Z graph_break [] 2025-12-04T10:11:57.4790744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4790850Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4791048Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4791587Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4791717Z graph_break [] 2025-12-04T10:11:57.4791840Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4791937Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4792061Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4792596Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4792659Z graph_break [] 2025-12-04T10:11:57.4793148Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml - 2025-12-04T10:11:57.4793253Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4794549Z FAILED [0.5346s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4794553Z 2025-12-04T10:11:57.4794682Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4795207Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4795210Z 2025-12-04T10:11:57.4795363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4795473Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4795587Z =================== 1 failed, 6 deselected, 2 rerun in 3.03s =================== 2025-12-04T10:11:57.4795653Z Got exit code 1 2025-12-04T10:11:57.4795717Z Retrying single test... 2025-12-04T10:11:57.4795983Z W1204 09:31:36.178000 37716 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4796371Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml 2025-12-04T10:11:57.4796471Z ============================= test session starts ============================== 2025-12-04T10:11:57.4796681Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4796756Z cachedir: .pytest_cache 2025-12-04T10:11:57.4797064Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4797150Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4797218Z configfile: pytest.ini 2025-12-04T10:11:57.4797547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4797688Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4798261Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4798406Z Running 1 items in this shard 2025-12-04T10:11:57.4798415Z 2025-12-04T10:11:57.4799150Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:37.767041253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4799219Z 2025-12-04T10:11:57.4799524Z [W1204 09:31:47.962273885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4799533Z 2025-12-04T10:11:57.4799827Z [W1204 09:31:47.962529549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4799831Z 2025-12-04T10:11:57.4800213Z [W1204 09:31:47.968336088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4800216Z 2025-12-04T10:11:57.4800515Z [W1204 09:31:47.968929309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4800519Z 2025-12-04T10:11:57.4800811Z [W1204 09:31:47.969115432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4800816Z 2025-12-04T10:11:57.4801114Z [W1204 09:31:47.974502114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4801117Z 2025-12-04T10:11:57.4801406Z [W1204 09:31:47.975044133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4801410Z 2025-12-04T10:11:57.4801704Z [W1204 09:31:47.975203636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4801708Z 2025-12-04T10:11:57.4801794Z ('RERUN', {'yellow': True}) [11.1389s] [100%] 2025-12-04T10:11:57.4802526Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:47.777956342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4802538Z 2025-12-04T10:11:57.4802830Z [W1204 09:31:47.778482331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4802834Z 2025-12-04T10:11:57.4803124Z [W1204 09:31:47.778620293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4803127Z 2025-12-04T10:11:57.4803420Z [W1204 09:31:47.781593514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4803423Z 2025-12-04T10:11:57.4803715Z [W1204 09:31:47.782054621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4803718Z 2025-12-04T10:11:57.4804015Z [W1204 09:31:47.782191474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4804021Z 2025-12-04T10:11:57.4804320Z [W1204 09:31:47.786757072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4804325Z 2025-12-04T10:11:57.4804623Z [W1204 09:31:47.787213480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4804626Z 2025-12-04T10:11:57.4804915Z [W1204 09:31:47.787349512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4804919Z 2025-12-04T10:11:57.4805079Z ('RERUN', {'yellow': True}) [0.5056s] [100%] 2025-12-04T10:11:57.4805806Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:48.296193198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4805873Z 2025-12-04T10:11:57.4806165Z [W1204 09:31:48.296718747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4806175Z 2025-12-04T10:11:57.4806468Z [W1204 09:31:48.296860140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4806471Z 2025-12-04T10:11:57.4806761Z [W1204 09:31:48.299783579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4806765Z 2025-12-04T10:11:57.4807062Z [W1204 09:31:48.300255838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4807065Z 2025-12-04T10:11:57.4807352Z [W1204 09:31:48.300408670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4807358Z 2025-12-04T10:11:57.4807653Z [W1204 09:31:48.304935187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4807656Z 2025-12-04T10:11:57.4807945Z [W1204 09:31:48.305384995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4807948Z 2025-12-04T10:11:57.4808243Z [W1204 09:31:48.305521017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4808246Z 2025-12-04T10:11:57.4808311Z FAILED [0.5075s] [100%] 2025-12-04T10:11:57.4808314Z 2025-12-04T10:11:57.4808403Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4808706Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4808786Z Traceback (most recent call last): 2025-12-04T10:11:57.4809102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4809167Z method(*args, **kwargs) 2025-12-04T10:11:57.4809464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4809536Z method(*args, **kwargs) 2025-12-04T10:11:57.4809829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4809899Z with policy(): 2025-12-04T10:11:57.4810200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4810267Z raise RuntimeError(msg) 2025-12-04T10:11:57.4811086Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4811093Z 2025-12-04T10:11:57.4811223Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4811752Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4811755Z 2025-12-04T10:11:57.4811918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4812133Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4812240Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4812783Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4812986Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4813048Z graph_break [] 2025-12-04T10:11:57.4813174Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4813880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4813955Z if out == self.unknown_value: 2025-12-04T10:11:57.4814255Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4814328Z Traceback (most recent call last): 2025-12-04T10:11:57.4814631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4814706Z method(*args, **kwargs) 2025-12-04T10:11:57.4814999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4815061Z method(*args, **kwargs) 2025-12-04T10:11:57.4815359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4815421Z with policy(): 2025-12-04T10:11:57.4815723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4815796Z raise RuntimeError(msg) 2025-12-04T10:11:57.4816617Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4816629Z 2025-12-04T10:11:57.4816757Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4817459Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4817462Z 2025-12-04T10:11:57.4817628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4817757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4817863Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4818407Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4818538Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4818604Z graph_break [] 2025-12-04T10:11:57.4818725Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4819414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4819492Z if out == self.unknown_value: 2025-12-04T10:11:57.4819728Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4819828Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4819952Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4820501Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4820660Z graph_break [] 2025-12-04T10:11:57.4820744Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4821038Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4821110Z Traceback (most recent call last): 2025-12-04T10:11:57.4821413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4821484Z method(*args, **kwargs) 2025-12-04T10:11:57.4821784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4821847Z method(*args, **kwargs) 2025-12-04T10:11:57.4822147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4822208Z with policy(): 2025-12-04T10:11:57.4822508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4822588Z raise RuntimeError(msg) 2025-12-04T10:11:57.4823419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4823430Z 2025-12-04T10:11:57.4823556Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4824075Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4824081Z 2025-12-04T10:11:57.4824243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4824366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4824462Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4825003Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4825129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4825194Z graph_break [] 2025-12-04T10:11:57.4825319Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4826016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4826090Z if out == self.unknown_value: 2025-12-04T10:11:57.4826212Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4826309Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4826431Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4827045Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4827110Z graph_break [] 2025-12-04T10:11:57.4827231Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4827325Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4827512Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4828051Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4828117Z graph_break [] 2025-12-04T10:11:57.4828604Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml - 2025-12-04T10:11:57.4828712Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4830008Z FAILED [0.5075s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4830015Z 2025-12-04T10:11:57.4830144Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4830663Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4830667Z 2025-12-04T10:11:57.4830825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4830937Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4831053Z ================== 1 failed, 57 deselected, 2 rerun in 12.18s ================== 2025-12-04T10:11:57.4831123Z Got exit code 1 2025-12-04T10:11:57.4831187Z Retrying single test... 2025-12-04T10:11:57.4831450Z W1204 09:31:54.909000 37910 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4831841Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml 2025-12-04T10:11:57.4831936Z ============================= test session starts ============================== 2025-12-04T10:11:57.4832153Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4832221Z cachedir: .pytest_cache 2025-12-04T10:11:57.4832530Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4832613Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4832678Z configfile: pytest.ini 2025-12-04T10:11:57.4832996Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4833133Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4833703Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4833786Z Running 1 items in this shard 2025-12-04T10:11:57.4833790Z 2025-12-04T10:11:57.4834598Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:56.509788887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4834603Z 2025-12-04T10:11:57.4834909Z [W1204 09:32:05.750867435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4834977Z 2025-12-04T10:11:57.4835272Z [W1204 09:32:05.751128189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4835276Z 2025-12-04T10:11:57.4835566Z [W1204 09:32:05.756947319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4835575Z 2025-12-04T10:11:57.4835865Z [W1204 09:32:05.757551830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4835869Z 2025-12-04T10:11:57.4836158Z [W1204 09:32:05.757734133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4836161Z 2025-12-04T10:11:57.4836461Z [W1204 09:32:05.763097335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4836467Z 2025-12-04T10:11:57.4836757Z [W1204 09:32:05.763637634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4836760Z 2025-12-04T10:11:57.4837055Z [W1204 09:32:05.763800517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4837058Z 2025-12-04T10:11:57.4837141Z ('RERUN', {'yellow': True}) [11.1941s] [100%] 2025-12-04T10:11:57.4837883Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:32:06.566384240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4837888Z 2025-12-04T10:11:57.4838183Z [W1204 09:32:06.566880988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4838189Z 2025-12-04T10:11:57.4838487Z [W1204 09:32:06.567020251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4838490Z 2025-12-04T10:11:57.4838781Z [W1204 09:32:06.569899610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4838784Z 2025-12-04T10:11:57.4839074Z [W1204 09:32:06.570385918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4839077Z 2025-12-04T10:11:57.4839372Z [W1204 09:32:06.570527940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4839375Z 2025-12-04T10:11:57.4839664Z [W1204 09:32:06.574949046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4839671Z 2025-12-04T10:11:57.4840008Z [W1204 09:32:06.575397154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4840011Z 2025-12-04T10:11:57.4840298Z [W1204 09:32:06.575531876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4840301Z 2025-12-04T10:11:57.4840384Z ('RERUN', {'yellow': True}) [0.5008s] [100%] 2025-12-04T10:11:57.4841275Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:32:07.066086491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4841279Z 2025-12-04T10:11:57.4841576Z [W1204 09:32:07.066592690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4841644Z 2025-12-04T10:11:57.4841931Z [W1204 09:32:07.066731342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4841934Z 2025-12-04T10:11:57.4842220Z [W1204 09:32:07.069597131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4842227Z 2025-12-04T10:11:57.4842519Z [W1204 09:32:07.070060799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4842523Z 2025-12-04T10:11:57.4842813Z [W1204 09:32:07.070202162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4842816Z 2025-12-04T10:11:57.4843121Z [W1204 09:32:07.074655638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4843128Z 2025-12-04T10:11:57.4843419Z [W1204 09:32:07.075105216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4843422Z 2025-12-04T10:11:57.4843711Z [W1204 09:32:07.075242259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4843715Z 2025-12-04T10:11:57.4843776Z FAILED [0.5016s] [100%] 2025-12-04T10:11:57.4843780Z 2025-12-04T10:11:57.4843863Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4844158Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4844231Z Traceback (most recent call last): 2025-12-04T10:11:57.4844540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4844603Z method(*args, **kwargs) 2025-12-04T10:11:57.4844900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4844966Z method(*args, **kwargs) 2025-12-04T10:11:57.4845255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4845318Z with policy(): 2025-12-04T10:11:57.4845609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4845675Z raise RuntimeError(msg) 2025-12-04T10:11:57.4846485Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4846492Z 2025-12-04T10:11:57.4846618Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4847142Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4847146Z 2025-12-04T10:11:57.4847304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4847436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4847538Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4848158Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4848291Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4848414Z graph_break [] 2025-12-04T10:11:57.4848538Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4849230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4849299Z if out == self.unknown_value: 2025-12-04T10:11:57.4849592Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4849664Z Traceback (most recent call last): 2025-12-04T10:11:57.4849963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4850031Z method(*args, **kwargs) 2025-12-04T10:11:57.4850323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4850392Z method(*args, **kwargs) 2025-12-04T10:11:57.4850683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4850742Z with policy(): 2025-12-04T10:11:57.4851039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4851105Z raise RuntimeError(msg) 2025-12-04T10:11:57.4851925Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4851933Z 2025-12-04T10:11:57.4852059Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4852577Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4852581Z 2025-12-04T10:11:57.4852741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4852865Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4852963Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4853509Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4853634Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4853697Z graph_break [] 2025-12-04T10:11:57.4853820Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4854517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4854586Z if out == self.unknown_value: 2025-12-04T10:11:57.4854710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4854817Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4854941Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4855561Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4855623Z graph_break [] 2025-12-04T10:11:57.4855772Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4856065Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.4856138Z Traceback (most recent call last): 2025-12-04T10:11:57.4856436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4856505Z method(*args, **kwargs) 2025-12-04T10:11:57.4856799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4856866Z method(*args, **kwargs) 2025-12-04T10:11:57.4857159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4857220Z with policy(): 2025-12-04T10:11:57.4857520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4857589Z raise RuntimeError(msg) 2025-12-04T10:11:57.4858419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4858423Z 2025-12-04T10:11:57.4858548Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4859063Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4859067Z 2025-12-04T10:11:57.4859229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4859353Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4859449Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4859993Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4860118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4860180Z graph_break [] 2025-12-04T10:11:57.4860303Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4860994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4861062Z if out == self.unknown_value: 2025-12-04T10:11:57.4861185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4861280Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4861401Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4861945Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4862005Z graph_break [] 2025-12-04T10:11:57.4862129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4862292Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4862414Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4862947Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4863096Z graph_break [] 2025-12-04T10:11:57.4863584Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml - 2025-12-04T10:11:57.4863691Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4864988Z FAILED [0.5016s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4864995Z 2025-12-04T10:11:57.4865122Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4865642Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4865646Z 2025-12-04T10:11:57.4865806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4865908Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4866028Z ================== 1 failed, 57 deselected, 2 rerun in 12.22s ================== 2025-12-04T10:11:57.4866093Z Got exit code 1 2025-12-04T10:11:57.4866564Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.4866813Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.4867080Z W1204 09:32:13.708000 38104 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4867467Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml 2025-12-04T10:11:57.4867565Z ============================= test session starts ============================== 2025-12-04T10:11:57.4867774Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4867840Z cachedir: .pytest_cache 2025-12-04T10:11:57.4868148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4868224Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4868291Z configfile: pytest.ini 2025-12-04T10:11:57.4868608Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4868747Z collecting ... collected 58 items / 7 deselected / 51 selected 2025-12-04T10:11:57.4868839Z stepcurrent: skipping 7 already run items. 2025-12-04T10:11:57.4868907Z Running 51 items in this shard 2025-12-04T10:11:57.4868910Z 2025-12-04T10:11:57.4869413Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8969s] [ 1%] 2025-12-04T10:11:57.4869985Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4780s] [ 1%] 2025-12-04T10:11:57.4870436Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4686s] [ 1%] 2025-12-04T10:11:57.4870504Z 2025-12-04T10:11:57.4870592Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4870885Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4870962Z Traceback (most recent call last): 2025-12-04T10:11:57.4871266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4871332Z method(*args, **kwargs) 2025-12-04T10:11:57.4871633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4871697Z method(*args, **kwargs) 2025-12-04T10:11:57.4871988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4872054Z with policy(): 2025-12-04T10:11:57.4872349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4872419Z raise RuntimeError(msg) 2025-12-04T10:11:57.4873226Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4873230Z 2025-12-04T10:11:57.4873360Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4873887Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4873892Z 2025-12-04T10:11:57.4874048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4874176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4874272Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4874621Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4874749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4874808Z graph_break [] 2025-12-04T10:11:57.4875108Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4875180Z Traceback (most recent call last): 2025-12-04T10:11:57.4875476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4875553Z method(*args, **kwargs) 2025-12-04T10:11:57.4875845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4875910Z method(*args, **kwargs) 2025-12-04T10:11:57.4876199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4876257Z with policy(): 2025-12-04T10:11:57.4876554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4876623Z raise RuntimeError(msg) 2025-12-04T10:11:57.4877520Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4877587Z 2025-12-04T10:11:57.4877714Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4878232Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4878239Z 2025-12-04T10:11:57.4878395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4878519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4878614Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4878964Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4879087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4879151Z graph_break [] 2025-12-04T10:11:57.4879277Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4879372Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4879490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4879843Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4879950Z graph_break [] 2025-12-04T10:11:57.4880036Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4880333Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4880409Z Traceback (most recent call last): 2025-12-04T10:11:57.4880708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4880778Z method(*args, **kwargs) 2025-12-04T10:11:57.4881067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4881133Z method(*args, **kwargs) 2025-12-04T10:11:57.4881427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4881485Z with policy(): 2025-12-04T10:11:57.4881782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4881852Z raise RuntimeError(msg) 2025-12-04T10:11:57.4882670Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4882677Z 2025-12-04T10:11:57.4882802Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4883319Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4883324Z 2025-12-04T10:11:57.4883481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4883603Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4883694Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4884116Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4884241Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4884366Z graph_break [] 2025-12-04T10:11:57.4884488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4884575Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4884700Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4885038Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4885097Z graph_break [] 2025-12-04T10:11:57.4885221Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4885307Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4885435Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4885772Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4885831Z graph_break [] 2025-12-04T10:11:57.4886322Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml - 2025-12-04T10:11:57.4886421Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4887725Z FAILED [0.4686s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4887729Z 2025-12-04T10:11:57.4887855Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4888381Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4888385Z 2025-12-04T10:11:57.4888542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4888645Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4888773Z =================== 1 failed, 7 deselected, 2 rerun in 2.87s =================== 2025-12-04T10:11:57.4888833Z Got exit code 1 2025-12-04T10:11:57.4888897Z Retrying single test... 2025-12-04T10:11:57.4889166Z W1204 09:32:23.368000 38292 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4889554Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml 2025-12-04T10:11:57.4889656Z ============================= test session starts ============================== 2025-12-04T10:11:57.4889864Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4889930Z cachedir: .pytest_cache 2025-12-04T10:11:57.4890238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4890313Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4890380Z configfile: pytest.ini 2025-12-04T10:11:57.4890766Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4890894Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4891472Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4891635Z Running 1 items in this shard 2025-12-04T10:11:57.4891639Z 2025-12-04T10:11:57.4892378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:24.662379778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4892382Z 2025-12-04T10:11:57.4892681Z [W1204 09:32:33.900461285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4892685Z 2025-12-04T10:11:57.4892985Z [W1204 09:32:33.900713640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4892989Z 2025-12-04T10:11:57.4893277Z [W1204 09:32:33.906515510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4893283Z 2025-12-04T10:11:57.4893571Z [W1204 09:32:33.907124810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4893574Z 2025-12-04T10:11:57.4893872Z [W1204 09:32:33.907296653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4893875Z 2025-12-04T10:11:57.4894160Z [W1204 09:32:33.912789468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4894163Z 2025-12-04T10:11:57.4894461Z [W1204 09:32:33.913329287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4894465Z 2025-12-04T10:11:57.4894751Z [W1204 09:32:33.913493520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4894757Z 2025-12-04T10:11:57.4894841Z ('RERUN', {'yellow': True}) [11.1534s] [100%] 2025-12-04T10:11:57.4895571Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:34.934545480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4895574Z 2025-12-04T10:11:57.4895868Z [W1204 09:32:34.935069449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4895872Z 2025-12-04T10:11:57.4896161Z [W1204 09:32:34.935208421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4896164Z 2025-12-04T10:11:57.4896453Z [W1204 09:32:34.938117661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4896462Z 2025-12-04T10:11:57.4896752Z [W1204 09:32:34.938676821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4896755Z 2025-12-04T10:11:57.4897042Z [W1204 09:32:34.938817933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4897046Z 2025-12-04T10:11:57.4897338Z [W1204 09:32:35.943361821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4897342Z 2025-12-04T10:11:57.4897710Z [W1204 09:32:35.943818549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4897714Z 2025-12-04T10:11:57.4898010Z [W1204 09:32:35.943956642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4898076Z 2025-12-04T10:11:57.4898156Z ('RERUN', {'yellow': True}) [0.4446s] [100%] 2025-12-04T10:11:57.4898891Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:35.379063070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4898894Z 2025-12-04T10:11:57.4899184Z [W1204 09:32:35.379586549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4899188Z 2025-12-04T10:11:57.4899487Z [W1204 09:32:35.379727122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4899491Z 2025-12-04T10:11:57.4899777Z [W1204 09:32:35.382658992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4899783Z 2025-12-04T10:11:57.4900072Z [W1204 09:32:35.383211512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4900080Z 2025-12-04T10:11:57.4900368Z [W1204 09:32:35.383348644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4900371Z 2025-12-04T10:11:57.4900661Z [W1204 09:32:35.387831571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4900664Z 2025-12-04T10:11:57.4900973Z [W1204 09:32:35.388282969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4900977Z 2025-12-04T10:11:57.4901264Z [W1204 09:32:35.388429032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4901270Z 2025-12-04T10:11:57.4901335Z FAILED [0.4426s] [100%] 2025-12-04T10:11:57.4901338Z 2025-12-04T10:11:57.4901425Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4901724Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4901802Z Traceback (most recent call last): 2025-12-04T10:11:57.4902110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4902183Z method(*args, **kwargs) 2025-12-04T10:11:57.4902479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4902543Z method(*args, **kwargs) 2025-12-04T10:11:57.4902840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4902907Z with policy(): 2025-12-04T10:11:57.4903201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4903271Z raise RuntimeError(msg) 2025-12-04T10:11:57.4904083Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4904087Z 2025-12-04T10:11:57.4904225Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4904820Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4904824Z 2025-12-04T10:11:57.4905049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4905175Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4905270Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4905624Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4905749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4905811Z graph_break [] 2025-12-04T10:11:57.4905936Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4906636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4906716Z if out == self.unknown_value: 2025-12-04T10:11:57.4907009Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4907090Z Traceback (most recent call last): 2025-12-04T10:11:57.4907399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4907464Z method(*args, **kwargs) 2025-12-04T10:11:57.4907760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4907822Z method(*args, **kwargs) 2025-12-04T10:11:57.4908114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4908180Z with policy(): 2025-12-04T10:11:57.4908472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4908543Z raise RuntimeError(msg) 2025-12-04T10:11:57.4909367Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4909371Z 2025-12-04T10:11:57.4909495Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4910026Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4910030Z 2025-12-04T10:11:57.4910185Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4910314Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4910410Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4910760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4910887Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4910943Z graph_break [] 2025-12-04T10:11:57.4911072Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4911837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4911907Z if out == self.unknown_value: 2025-12-04T10:11:57.4912032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4912189Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4912318Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4912662Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4912722Z graph_break [] 2025-12-04T10:11:57.4912808Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4913108Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4913182Z Traceback (most recent call last): 2025-12-04T10:11:57.4913489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4913552Z method(*args, **kwargs) 2025-12-04T10:11:57.4913845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4913910Z method(*args, **kwargs) 2025-12-04T10:11:57.4914198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4914264Z with policy(): 2025-12-04T10:11:57.4914555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4914623Z raise RuntimeError(msg) 2025-12-04T10:11:57.4915450Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4915454Z 2025-12-04T10:11:57.4915581Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4916107Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4916110Z 2025-12-04T10:11:57.4916267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4916393Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4916483Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4916828Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4916959Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4917123Z graph_break [] 2025-12-04T10:11:57.4917251Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4917942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4918009Z if out == self.unknown_value: 2025-12-04T10:11:57.4918137Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4918225Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4918353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4918815Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4918879Z graph_break [] 2025-12-04T10:11:57.4919019Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4919200Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4919321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4919666Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4919723Z graph_break [] 2025-12-04T10:11:57.4920246Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml - 2025-12-04T10:11:57.4920349Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4921652Z FAILED [0.4426s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4921667Z 2025-12-04T10:11:57.4921789Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4922308Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4922312Z 2025-12-04T10:11:57.4922473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4922575Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4922706Z ================== 1 failed, 57 deselected, 2 rerun in 12.06s ================== 2025-12-04T10:11:57.4922770Z Got exit code 1 2025-12-04T10:11:57.4922840Z Retrying single test... 2025-12-04T10:11:57.4923125Z W1204 09:32:42.025000 38485 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4923511Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml 2025-12-04T10:11:57.4923611Z ============================= test session starts ============================== 2025-12-04T10:11:57.4923845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4923913Z cachedir: .pytest_cache 2025-12-04T10:11:57.4924230Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4924305Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4924370Z configfile: pytest.ini 2025-12-04T10:11:57.4924687Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4924815Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4925389Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4925459Z Running 1 items in this shard 2025-12-04T10:11:57.4925463Z 2025-12-04T10:11:57.4926268Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:43.322060700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4926277Z 2025-12-04T10:11:57.4926577Z [W1204 09:32:52.197663170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4926645Z 2025-12-04T10:11:57.4926937Z [W1204 09:32:52.197911265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4926940Z 2025-12-04T10:11:57.4927233Z [W1204 09:32:52.203659784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4927236Z 2025-12-04T10:11:57.4927548Z [W1204 09:32:52.204267884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4927552Z 2025-12-04T10:11:57.4927900Z [W1204 09:32:52.204449087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4927903Z 2025-12-04T10:11:57.4928247Z [W1204 09:32:52.209825500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4928254Z 2025-12-04T10:11:57.4928602Z [W1204 09:32:52.210363229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4928606Z 2025-12-04T10:11:57.4928954Z [W1204 09:32:52.210529682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4928958Z 2025-12-04T10:11:57.4929055Z ('RERUN', {'yellow': True}) [10.7953s] [100%] 2025-12-04T10:11:57.4929902Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:53.236600564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4929906Z 2025-12-04T10:11:57.4930194Z [W1204 09:32:53.237122413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4930204Z 2025-12-04T10:11:57.4930491Z [W1204 09:32:53.237264965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4930494Z 2025-12-04T10:11:57.4930781Z [W1204 09:32:53.240151695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4930785Z 2025-12-04T10:11:57.4931077Z [W1204 09:32:53.240708464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4931081Z 2025-12-04T10:11:57.4931370Z [W1204 09:32:53.240850847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4931373Z 2025-12-04T10:11:57.4931665Z [W1204 09:32:53.245336364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4931668Z 2025-12-04T10:11:57.4931958Z [W1204 09:32:53.245790192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4931961Z 2025-12-04T10:11:57.4932257Z [W1204 09:32:53.245925174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4932260Z 2025-12-04T10:11:57.4932338Z ('RERUN', {'yellow': True}) [0.4499s] [100%] 2025-12-04T10:11:57.4933136Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:53.684496682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4933146Z 2025-12-04T10:11:57.4933435Z [W1204 09:32:53.685014021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4933502Z 2025-12-04T10:11:57.4933792Z [W1204 09:32:53.685153033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4933795Z 2025-12-04T10:11:57.4934086Z [W1204 09:32:53.687987522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4934090Z 2025-12-04T10:11:57.4934376Z [W1204 09:32:53.688535731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4934379Z 2025-12-04T10:11:57.4934680Z [W1204 09:32:53.688679264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4934683Z 2025-12-04T10:11:57.4934971Z [W1204 09:32:53.693195522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4934974Z 2025-12-04T10:11:57.4935268Z [W1204 09:32:53.693645199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4935271Z 2025-12-04T10:11:57.4935561Z [W1204 09:32:53.693779762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4935565Z 2025-12-04T10:11:57.4935633Z FAILED [0.4469s] [100%] 2025-12-04T10:11:57.4935638Z 2025-12-04T10:11:57.4935728Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4936026Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4936109Z Traceback (most recent call last): 2025-12-04T10:11:57.4936417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4936482Z method(*args, **kwargs) 2025-12-04T10:11:57.4936783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4936844Z method(*args, **kwargs) 2025-12-04T10:11:57.4937140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4937198Z with policy(): 2025-12-04T10:11:57.4937508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4937591Z raise RuntimeError(msg) 2025-12-04T10:11:57.4938566Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4938571Z 2025-12-04T10:11:57.4938724Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4939343Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4939346Z 2025-12-04T10:11:57.4939507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4939635Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4939729Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4940161Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4940290Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4940350Z graph_break [] 2025-12-04T10:11:57.4940477Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4941239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4941312Z if out == self.unknown_value: 2025-12-04T10:11:57.4941607Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4941679Z Traceback (most recent call last): 2025-12-04T10:11:57.4941987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4942050Z method(*args, **kwargs) 2025-12-04T10:11:57.4942354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4942426Z method(*args, **kwargs) 2025-12-04T10:11:57.4942720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4942783Z with policy(): 2025-12-04T10:11:57.4943076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4943144Z raise RuntimeError(msg) 2025-12-04T10:11:57.4943973Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4943977Z 2025-12-04T10:11:57.4944101Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4944630Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4944636Z 2025-12-04T10:11:57.4944795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4944921Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4945018Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4945370Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4945497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4945559Z graph_break [] 2025-12-04T10:11:57.4945681Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4946376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4946449Z if out == self.unknown_value: 2025-12-04T10:11:57.4946576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4946666Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4946790Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4947141Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4947292Z graph_break [] 2025-12-04T10:11:57.4947382Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4947676Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.4947815Z Traceback (most recent call last): 2025-12-04T10:11:57.4948119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4948181Z method(*args, **kwargs) 2025-12-04T10:11:57.4948474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4948542Z method(*args, **kwargs) 2025-12-04T10:11:57.4948835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4948900Z with policy(): 2025-12-04T10:11:57.4949193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4949258Z raise RuntimeError(msg) 2025-12-04T10:11:57.4950085Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4950091Z 2025-12-04T10:11:57.4950215Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4950741Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4950745Z 2025-12-04T10:11:57.4950900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4951032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4951125Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4951468Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4951602Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4951660Z graph_break [] 2025-12-04T10:11:57.4951783Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4952475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4952541Z if out == self.unknown_value: 2025-12-04T10:11:57.4952669Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4952770Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4952895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4953247Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4953306Z graph_break [] 2025-12-04T10:11:57.4953430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4953517Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4953634Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4953979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4954036Z graph_break [] 2025-12-04T10:11:57.4954673Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml - 2025-12-04T10:11:57.4954779Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4956139Z FAILED [0.4469s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4956147Z 2025-12-04T10:11:57.4956270Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4956797Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4956801Z 2025-12-04T10:11:57.4956960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4957065Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4957178Z ================== 1 failed, 57 deselected, 2 rerun in 11.72s ================== 2025-12-04T10:11:57.4957240Z Got exit code 1 2025-12-04T10:11:57.4957713Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.4957969Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.4958232Z W1204 09:33:00.310000 38678 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4958623Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml 2025-12-04T10:11:57.4958731Z ============================= test session starts ============================== 2025-12-04T10:11:57.4958939Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4959008Z cachedir: .pytest_cache 2025-12-04T10:11:57.4959313Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4959388Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4959458Z configfile: pytest.ini 2025-12-04T10:11:57.4959777Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4959957Z collecting ... collected 58 items / 8 deselected / 50 selected 2025-12-04T10:11:57.4960059Z stepcurrent: skipping 8 already run items. 2025-12-04T10:11:57.4960133Z Running 50 items in this shard 2025-12-04T10:11:57.4960137Z 2025-12-04T10:11:57.4960641Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9320s] [ 2%] 2025-12-04T10:11:57.4961131Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5370s] [ 2%] 2025-12-04T10:11:57.4961587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5338s] [ 2%] 2025-12-04T10:11:57.4961591Z 2025-12-04T10:11:57.4961747Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4962042Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4962122Z Traceback (most recent call last): 2025-12-04T10:11:57.4962427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4962561Z method(*args, **kwargs) 2025-12-04T10:11:57.4962858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4962920Z method(*args, **kwargs) 2025-12-04T10:11:57.4963215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4963275Z with policy(): 2025-12-04T10:11:57.4963572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4963654Z raise RuntimeError(msg) 2025-12-04T10:11:57.4964454Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4964461Z 2025-12-04T10:11:57.4964593Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4965113Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4965116Z 2025-12-04T10:11:57.4965280Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4965410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4965509Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4966057Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4966189Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4966257Z graph_break [] 2025-12-04T10:11:57.4966555Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4966631Z Traceback (most recent call last): 2025-12-04T10:11:57.4966933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4966995Z method(*args, **kwargs) 2025-12-04T10:11:57.4967288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4967356Z method(*args, **kwargs) 2025-12-04T10:11:57.4967647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4967712Z with policy(): 2025-12-04T10:11:57.4968014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4968082Z raise RuntimeError(msg) 2025-12-04T10:11:57.4968895Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.4968900Z 2025-12-04T10:11:57.4969028Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4969624Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4969628Z 2025-12-04T10:11:57.4969797Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4970008Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4970106Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4970649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4970780Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4970838Z graph_break [] 2025-12-04T10:11:57.4970964Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4971058Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4971178Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4971721Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4971784Z graph_break [] 2025-12-04T10:11:57.4971865Z =================================== FAILURES =================================== 2025-12-04T10:11:57.4972157Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4972229Z Traceback (most recent call last): 2025-12-04T10:11:57.4972529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4972601Z method(*args, **kwargs) 2025-12-04T10:11:57.4972892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4972958Z method(*args, **kwargs) 2025-12-04T10:11:57.4973252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4973313Z with policy(): 2025-12-04T10:11:57.4973613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4973678Z raise RuntimeError(msg) 2025-12-04T10:11:57.4974500Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4974505Z 2025-12-04T10:11:57.4974628Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4975150Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4975161Z 2025-12-04T10:11:57.4975316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4975443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4975539Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4976079Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4976271Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4976346Z graph_break [] 2025-12-04T10:11:57.4976469Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4976562Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4976748Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4977283Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4977346Z graph_break [] 2025-12-04T10:11:57.4977469Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4977562Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4977683Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4978220Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4978286Z graph_break [] 2025-12-04T10:11:57.4978782Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml - 2025-12-04T10:11:57.4978891Z =========================== short test summary info ============================ 2025-12-04T10:11:57.4980178Z FAILED [0.5338s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.4980183Z 2025-12-04T10:11:57.4980309Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4980824Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4980831Z 2025-12-04T10:11:57.4980986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4981096Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.4981210Z =================== 1 failed, 8 deselected, 2 rerun in 3.03s =================== 2025-12-04T10:11:57.4981267Z Got exit code 1 2025-12-04T10:11:57.4981335Z Retrying single test... 2025-12-04T10:11:57.4981597Z W1204 09:33:09.985000 38860 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.4981985Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml 2025-12-04T10:11:57.4982080Z ============================= test session starts ============================== 2025-12-04T10:11:57.4982288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.4982357Z cachedir: .pytest_cache 2025-12-04T10:11:57.4982662Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.4982742Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.4982807Z configfile: pytest.ini 2025-12-04T10:11:57.4983119Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.4983321Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.4983892Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4984027Z Running 1 items in this shard 2025-12-04T10:11:57.4984035Z 2025-12-04T10:11:57.4984763Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:11.576369425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4984767Z 2025-12-04T10:11:57.4985064Z [W1204 09:33:20.581721768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4985072Z 2025-12-04T10:11:57.4985365Z [W1204 09:33:20.581983773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4985369Z 2025-12-04T10:11:57.4985657Z [W1204 09:33:20.588022667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4985664Z 2025-12-04T10:11:57.4985956Z [W1204 09:33:20.588639418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4985960Z 2025-12-04T10:11:57.4986246Z [W1204 09:33:20.588820061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4986249Z 2025-12-04T10:11:57.4986547Z [W1204 09:33:20.594316085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4986551Z 2025-12-04T10:11:57.4986842Z [W1204 09:33:20.594870644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4986846Z 2025-12-04T10:11:57.4987138Z [W1204 09:33:20.595034167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4987143Z 2025-12-04T10:11:57.4987234Z ('RERUN', {'yellow': True}) [10.9543s] [100%] 2025-12-04T10:11:57.4987958Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:21.398347004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4987966Z 2025-12-04T10:11:57.4988262Z [W1204 09:33:21.398897023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4988266Z 2025-12-04T10:11:57.4988558Z [W1204 09:33:21.399037876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4988561Z 2025-12-04T10:11:57.4988854Z [W1204 09:33:21.402048727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4988860Z 2025-12-04T10:11:57.4989148Z [W1204 09:33:21.402506385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4989151Z 2025-12-04T10:11:57.4989442Z [W1204 09:33:21.402646787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4989445Z 2025-12-04T10:11:57.4989733Z [W1204 09:33:21.407247326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4989736Z 2025-12-04T10:11:57.4990102Z [W1204 09:33:21.407713024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4990105Z 2025-12-04T10:11:57.4990396Z [W1204 09:33:21.407857206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4990399Z 2025-12-04T10:11:57.4990544Z ('RERUN', {'yellow': True}) [0.5005s] [100%] 2025-12-04T10:11:57.4991269Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:21.897403745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4991273Z 2025-12-04T10:11:57.4991757Z [W1204 09:33:21.897958635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4991766Z 2025-12-04T10:11:57.4992081Z [W1204 09:33:21.898098617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4992084Z 2025-12-04T10:11:57.4992377Z [W1204 09:33:21.901065158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4992381Z 2025-12-04T10:11:57.4992682Z [W1204 09:33:21.901525035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4992685Z 2025-12-04T10:11:57.4992976Z [W1204 09:33:21.901663358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4992979Z 2025-12-04T10:11:57.4993272Z [W1204 09:33:21.906195075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4993276Z 2025-12-04T10:11:57.4993565Z [W1204 09:33:21.906654353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4993570Z 2025-12-04T10:11:57.4993862Z [W1204 09:33:21.906792356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.4993865Z 2025-12-04T10:11:57.4993937Z FAILED [0.4959s] [100%] 2025-12-04T10:11:57.4993944Z 2025-12-04T10:11:57.4994036Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.4994341Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.4994417Z Traceback (most recent call last): 2025-12-04T10:11:57.4994729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4994795Z method(*args, **kwargs) 2025-12-04T10:11:57.4995089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.4995160Z method(*args, **kwargs) 2025-12-04T10:11:57.4995451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.4995514Z with policy(): 2025-12-04T10:11:57.4995814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.4995879Z raise RuntimeError(msg) 2025-12-04T10:11:57.4996684Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.4996688Z 2025-12-04T10:11:57.4996823Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.4997438Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.4997445Z 2025-12-04T10:11:57.4997611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.4997832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.4997944Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.4998496Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.4998629Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.4998688Z graph_break [] 2025-12-04T10:11:57.4998813Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.4999516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.4999594Z if out == self.unknown_value: 2025-12-04T10:11:57.4999974Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5000057Z Traceback (most recent call last): 2025-12-04T10:11:57.5000403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5000476Z method(*args, **kwargs) 2025-12-04T10:11:57.5000774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5000837Z method(*args, **kwargs) 2025-12-04T10:11:57.5001132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5001191Z with policy(): 2025-12-04T10:11:57.5001486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5001555Z raise RuntimeError(msg) 2025-12-04T10:11:57.5002362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5002370Z 2025-12-04T10:11:57.5002498Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5003017Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5003020Z 2025-12-04T10:11:57.5003181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5003317Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5003416Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5003964Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5004093Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5004155Z graph_break [] 2025-12-04T10:11:57.5004279Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5005046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5005123Z if out == self.unknown_value: 2025-12-04T10:11:57.5005243Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5005402Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5005524Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5006065Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5006131Z graph_break [] 2025-12-04T10:11:57.5006214Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5006508Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5006583Z Traceback (most recent call last): 2025-12-04T10:11:57.5006892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5006968Z method(*args, **kwargs) 2025-12-04T10:11:57.5007259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5007323Z method(*args, **kwargs) 2025-12-04T10:11:57.5007613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5007673Z with policy(): 2025-12-04T10:11:57.5007977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5008041Z raise RuntimeError(msg) 2025-12-04T10:11:57.5008860Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5008871Z 2025-12-04T10:11:57.5008998Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5009510Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5009514Z 2025-12-04T10:11:57.5009673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5009798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5009896Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5010439Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5010566Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5010629Z graph_break [] 2025-12-04T10:11:57.5010749Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5011436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5011507Z if out == self.unknown_value: 2025-12-04T10:11:57.5011628Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5011801Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5011926Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5012465Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5012592Z graph_break [] 2025-12-04T10:11:57.5012720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5012815Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5012936Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5013475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5013542Z graph_break [] 2025-12-04T10:11:57.5014031Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml - 2025-12-04T10:11:57.5014134Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5015421Z FAILED [0.4959s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5015425Z 2025-12-04T10:11:57.5015555Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5016075Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5016079Z 2025-12-04T10:11:57.5016234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5016342Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5016457Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ================== 2025-12-04T10:11:57.5016585Z Got exit code 1 2025-12-04T10:11:57.5025220Z Retrying single test... 2025-12-04T10:11:57.5025605Z W1204 09:33:28.518000 39047 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5026019Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml 2025-12-04T10:11:57.5026140Z ============================= test session starts ============================== 2025-12-04T10:11:57.5026373Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5026446Z cachedir: .pytest_cache 2025-12-04T10:11:57.5026771Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5026858Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5026926Z configfile: pytest.ini 2025-12-04T10:11:57.5027259Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5027393Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5028159Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5028254Z Running 1 items in this shard 2025-12-04T10:11:57.5028259Z 2025-12-04T10:11:57.5029009Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:30.095599455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5029106Z 2025-12-04T10:11:57.5029422Z [W1204 09:33:39.157059392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5029426Z 2025-12-04T10:11:57.5029758Z [W1204 09:33:39.157319097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5029762Z 2025-12-04T10:11:57.5030058Z [W1204 09:33:39.163698926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5030062Z 2025-12-04T10:11:57.5030351Z [W1204 09:33:39.164292726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5030355Z 2025-12-04T10:11:57.5030647Z [W1204 09:33:39.164489579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5030653Z 2025-12-04T10:11:57.5030942Z [W1204 09:33:39.169829230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5030945Z 2025-12-04T10:11:57.5031237Z [W1204 09:33:39.170698485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5031241Z 2025-12-04T10:11:57.5031530Z [W1204 09:33:39.170865868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5031536Z 2025-12-04T10:11:57.5031620Z ('RERUN', {'yellow': True}) [10.9960s] [100%] 2025-12-04T10:11:57.5032378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:40.966166290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5032387Z 2025-12-04T10:11:57.5032678Z [W1204 09:33:40.966687179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5032682Z 2025-12-04T10:11:57.5032972Z [W1204 09:33:40.966824491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5032976Z 2025-12-04T10:11:57.5033262Z [W1204 09:33:40.969659470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5033268Z 2025-12-04T10:11:57.5033562Z [W1204 09:33:40.970119147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5033565Z 2025-12-04T10:11:57.5033855Z [W1204 09:33:40.970260680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5033861Z 2025-12-04T10:11:57.5034151Z [W1204 09:33:40.974637874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5034155Z 2025-12-04T10:11:57.5034442Z [W1204 09:33:40.975086502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5034445Z 2025-12-04T10:11:57.5034735Z [W1204 09:33:40.975221514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5034738Z 2025-12-04T10:11:57.5034892Z ('RERUN', {'yellow': True}) [0.4908s] [100%] 2025-12-04T10:11:57.5035610Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:40.455635559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5035683Z 2025-12-04T10:11:57.5035972Z [W1204 09:33:40.456165199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5035975Z 2025-12-04T10:11:57.5036262Z [W1204 09:33:40.456312511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5036265Z 2025-12-04T10:11:57.5036556Z [W1204 09:33:40.459096058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5036559Z 2025-12-04T10:11:57.5036847Z [W1204 09:33:40.459530056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5036850Z 2025-12-04T10:11:57.5037141Z [W1204 09:33:40.459666068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5037148Z 2025-12-04T10:11:57.5037434Z [W1204 09:33:40.464033353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5037438Z 2025-12-04T10:11:57.5037727Z [W1204 09:33:40.464492771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5037731Z 2025-12-04T10:11:57.5038017Z [W1204 09:33:40.464629483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5038021Z 2025-12-04T10:11:57.5038087Z FAILED [0.4879s] [100%] 2025-12-04T10:11:57.5038095Z 2025-12-04T10:11:57.5038186Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5038483Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5038578Z Traceback (most recent call last): 2025-12-04T10:11:57.5038906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5038976Z method(*args, **kwargs) 2025-12-04T10:11:57.5039280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5039345Z method(*args, **kwargs) 2025-12-04T10:11:57.5039641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5039705Z with policy(): 2025-12-04T10:11:57.5040096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5040172Z raise RuntimeError(msg) 2025-12-04T10:11:57.5041003Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5041010Z 2025-12-04T10:11:57.5041149Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5041669Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5041673Z 2025-12-04T10:11:57.5041926Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5042075Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5042173Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5042730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5043367Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5043467Z graph_break [] 2025-12-04T10:11:57.5043710Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5047811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5047901Z if out == self.unknown_value: 2025-12-04T10:11:57.5048210Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5048290Z Traceback (most recent call last): 2025-12-04T10:11:57.5048665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5048754Z method(*args, **kwargs) 2025-12-04T10:11:57.5049121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5049200Z method(*args, **kwargs) 2025-12-04T10:11:57.5049571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5049646Z with policy(): 2025-12-04T10:11:57.5050020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5050100Z raise RuntimeError(msg) 2025-12-04T10:11:57.5051115Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5051128Z 2025-12-04T10:11:57.5051295Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5051947Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5051952Z 2025-12-04T10:11:57.5052155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5052327Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5052450Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5053131Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5053295Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5053371Z graph_break [] 2025-12-04T10:11:57.5053524Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5054386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5054472Z if out == self.unknown_value: 2025-12-04T10:11:57.5054729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5054851Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5055003Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5055783Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5055856Z graph_break [] 2025-12-04T10:11:57.5055957Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5056325Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5056416Z Traceback (most recent call last): 2025-12-04T10:11:57.5056790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5077593Z method(*args, **kwargs) 2025-12-04T10:11:57.5078080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5078237Z method(*args, **kwargs) 2025-12-04T10:11:57.5079087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5079254Z with policy(): 2025-12-04T10:11:57.5080280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5080441Z raise RuntimeError(msg) 2025-12-04T10:11:57.5082418Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5082433Z 2025-12-04T10:11:57.5082746Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5083994Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5084012Z 2025-12-04T10:11:57.5084402Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5084715Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5084948Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5086248Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5086557Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5086702Z graph_break [] 2025-12-04T10:11:57.5087000Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5088889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5089121Z if out == self.unknown_value: 2025-12-04T10:11:57.5089471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5089735Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5090084Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5091862Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5092038Z graph_break [] 2025-12-04T10:11:57.5092379Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5092802Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5093012Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5093617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5093675Z graph_break [] 2025-12-04T10:11:57.5094160Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml - 2025-12-04T10:11:57.5094264Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5095557Z FAILED [0.4879s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5095565Z 2025-12-04T10:11:57.5095692Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5096208Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5096212Z 2025-12-04T10:11:57.5096373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5096490Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5096613Z ================== 1 failed, 57 deselected, 2 rerun in 12.00s ================== 2025-12-04T10:11:57.5096676Z Got exit code 1 2025-12-04T10:11:57.5097155Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5097402Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5097669Z W1204 09:33:47.097000 39233 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5098064Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml 2025-12-04T10:11:57.5098171Z ============================= test session starts ============================== 2025-12-04T10:11:57.5098389Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5098458Z cachedir: .pytest_cache 2025-12-04T10:11:57.5098767Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5098852Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5098916Z configfile: pytest.ini 2025-12-04T10:11:57.5099232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5099364Z collecting ... collected 58 items / 9 deselected / 49 selected 2025-12-04T10:11:57.5099451Z stepcurrent: skipping 9 already run items. 2025-12-04T10:11:57.5099527Z Running 49 items in this shard 2025-12-04T10:11:57.5099530Z 2025-12-04T10:11:57.5100101Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9189s] [ 2%] 2025-12-04T10:11:57.5100595Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5388s] [ 2%] 2025-12-04T10:11:57.5101104Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5365s] [ 2%] 2025-12-04T10:11:57.5101108Z 2025-12-04T10:11:57.5101189Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5101487Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5101563Z Traceback (most recent call last): 2025-12-04T10:11:57.5101877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5101942Z method(*args, **kwargs) 2025-12-04T10:11:57.5102235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5102307Z method(*args, **kwargs) 2025-12-04T10:11:57.5102596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5102655Z with policy(): 2025-12-04T10:11:57.5102955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5103020Z raise RuntimeError(msg) 2025-12-04T10:11:57.5103824Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5103829Z 2025-12-04T10:11:57.5103957Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5104487Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5104491Z 2025-12-04T10:11:57.5104652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5104780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5104883Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5105430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5105565Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5105625Z graph_break [] 2025-12-04T10:11:57.5105917Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5105996Z Traceback (most recent call last): 2025-12-04T10:11:57.5106295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5106358Z method(*args, **kwargs) 2025-12-04T10:11:57.5106655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5106717Z method(*args, **kwargs) 2025-12-04T10:11:57.5107004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5107224Z with policy(): 2025-12-04T10:11:57.5107519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5107588Z raise RuntimeError(msg) 2025-12-04T10:11:57.5108467Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5108471Z 2025-12-04T10:11:57.5108598Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5109119Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5109122Z 2025-12-04T10:11:57.5109278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5109410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5109502Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5110049Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5110181Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5110250Z graph_break [] 2025-12-04T10:11:57.5110380Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5110468Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5110592Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5111144Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5111200Z graph_break [] 2025-12-04T10:11:57.5111291Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5111584Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5111658Z Traceback (most recent call last): 2025-12-04T10:11:57.5111961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5112025Z method(*args, **kwargs) 2025-12-04T10:11:57.5112316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5112382Z method(*args, **kwargs) 2025-12-04T10:11:57.5112670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5112734Z with policy(): 2025-12-04T10:11:57.5113025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5113093Z raise RuntimeError(msg) 2025-12-04T10:11:57.5113923Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5113927Z 2025-12-04T10:11:57.5114056Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5114675Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5114679Z 2025-12-04T10:11:57.5114839Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5115037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5115128Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5115678Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5115805Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5115867Z graph_break [] 2025-12-04T10:11:57.5115992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5116086Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5116216Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5116761Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5116823Z graph_break [] 2025-12-04T10:11:57.5116943Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5117206Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5117339Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5117880Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5117941Z graph_break [] 2025-12-04T10:11:57.5118442Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml - 2025-12-04T10:11:57.5118543Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5119832Z FAILED [0.5365s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5119836Z 2025-12-04T10:11:57.5120002Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5120526Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5120530Z 2025-12-04T10:11:57.5120693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5120796Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5120909Z =================== 1 failed, 9 deselected, 2 rerun in 3.02s =================== 2025-12-04T10:11:57.5120973Z Got exit code 1 2025-12-04T10:11:57.5121037Z Retrying single test... 2025-12-04T10:11:57.5121300Z W1204 09:33:56.766000 39415 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5121700Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml 2025-12-04T10:11:57.5121914Z ============================= test session starts ============================== 2025-12-04T10:11:57.5122130Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5122196Z cachedir: .pytest_cache 2025-12-04T10:11:57.5122592Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5122673Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5122739Z configfile: pytest.ini 2025-12-04T10:11:57.5123060Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5123188Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5123760Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5123836Z Running 1 items in this shard 2025-12-04T10:11:57.5123839Z 2025-12-04T10:11:57.5124568Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:33:58.364642927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5124575Z 2025-12-04T10:11:57.5124879Z [W1204 09:34:07.508603962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5124883Z 2025-12-04T10:11:57.5125173Z [W1204 09:34:07.508866637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5125176Z 2025-12-04T10:11:57.5125473Z [W1204 09:34:07.514919610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5125477Z 2025-12-04T10:11:57.5125765Z [W1204 09:34:07.515533741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5125768Z 2025-12-04T10:11:57.5126059Z [W1204 09:34:07.515715014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5126065Z 2025-12-04T10:11:57.5126352Z [W1204 09:34:07.521246868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5126355Z 2025-12-04T10:11:57.5126643Z [W1204 09:34:07.521770947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5126646Z 2025-12-04T10:11:57.5126939Z [W1204 09:34:07.521934730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5126945Z 2025-12-04T10:11:57.5127031Z ('RERUN', {'yellow': True}) [11.0976s] [100%] 2025-12-04T10:11:57.5127756Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:08.326060414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5127762Z 2025-12-04T10:11:57.5128052Z [W1204 09:34:08.326621193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5128056Z 2025-12-04T10:11:57.5128347Z [W1204 09:34:08.326763596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5128350Z 2025-12-04T10:11:57.5128637Z [W1204 09:34:08.329663705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5128711Z 2025-12-04T10:11:57.5129024Z [W1204 09:34:08.330134104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5129028Z 2025-12-04T10:11:57.5129318Z [W1204 09:34:08.330274326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5129385Z 2025-12-04T10:11:57.5129674Z [W1204 09:34:08.334745593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5129682Z 2025-12-04T10:11:57.5129970Z [W1204 09:34:08.335193161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5129973Z 2025-12-04T10:11:57.5130262Z [W1204 09:34:08.335329273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5130265Z 2025-12-04T10:11:57.5130352Z ('RERUN', {'yellow': True}) [0.5006s] [100%] 2025-12-04T10:11:57.5131070Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:08.824520567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5131077Z 2025-12-04T10:11:57.5131374Z [W1204 09:34:08.825056926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5131377Z 2025-12-04T10:11:57.5131665Z [W1204 09:34:08.825200848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5131668Z 2025-12-04T10:11:57.5131958Z [W1204 09:34:08.828193859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5131961Z 2025-12-04T10:11:57.5132252Z [W1204 09:34:08.828654707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5132255Z 2025-12-04T10:11:57.5132547Z [W1204 09:34:08.828793080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5132553Z 2025-12-04T10:11:57.5132841Z [W1204 09:34:08.833453559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5132845Z 2025-12-04T10:11:57.5133134Z [W1204 09:34:08.833903847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5133137Z 2025-12-04T10:11:57.5133433Z [W1204 09:34:08.834038189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5133436Z 2025-12-04T10:11:57.5133500Z FAILED [0.4986s] [100%] 2025-12-04T10:11:57.5133503Z 2025-12-04T10:11:57.5133588Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5133882Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5133960Z Traceback (most recent call last): 2025-12-04T10:11:57.5134274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5134338Z method(*args, **kwargs) 2025-12-04T10:11:57.5134642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5134706Z method(*args, **kwargs) 2025-12-04T10:11:57.5134998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5135072Z with policy(): 2025-12-04T10:11:57.5135437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5135504Z raise RuntimeError(msg) 2025-12-04T10:11:57.5136305Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5136374Z 2025-12-04T10:11:57.5136501Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5137022Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5137026Z 2025-12-04T10:11:57.5137186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5137318Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5137410Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5137957Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5138094Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5138152Z graph_break [] 2025-12-04T10:11:57.5138279Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5138965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5139038Z if out == self.unknown_value: 2025-12-04T10:11:57.5139332Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5139404Z Traceback (most recent call last): 2025-12-04T10:11:57.5139712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5139774Z method(*args, **kwargs) 2025-12-04T10:11:57.5140062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5140132Z method(*args, **kwargs) 2025-12-04T10:11:57.5140424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5140483Z with policy(): 2025-12-04T10:11:57.5140784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5140859Z raise RuntimeError(msg) 2025-12-04T10:11:57.5141671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5141678Z 2025-12-04T10:11:57.5141804Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5142322Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5142326Z 2025-12-04T10:11:57.5142484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5142706Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5142810Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5143353Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5143568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5143628Z graph_break [] 2025-12-04T10:11:57.5143754Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5144456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5144527Z if out == self.unknown_value: 2025-12-04T10:11:57.5144653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5144748Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5144871Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5145419Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5145478Z graph_break [] 2025-12-04T10:11:57.5145559Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5145850Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5145922Z Traceback (most recent call last): 2025-12-04T10:11:57.5146226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5146294Z method(*args, **kwargs) 2025-12-04T10:11:57.5146584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5146649Z method(*args, **kwargs) 2025-12-04T10:11:57.5146941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5147001Z with policy(): 2025-12-04T10:11:57.5147298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5147361Z raise RuntimeError(msg) 2025-12-04T10:11:57.5148180Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5148185Z 2025-12-04T10:11:57.5148307Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5148831Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5148837Z 2025-12-04T10:11:57.5148992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5149114Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5149206Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5149744Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5149955Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5150016Z graph_break [] 2025-12-04T10:11:57.5150138Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5150827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5150966Z if out == self.unknown_value: 2025-12-04T10:11:57.5151090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5151179Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5151299Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5151842Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5151901Z graph_break [] 2025-12-04T10:11:57.5152022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5152116Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5152235Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5152773Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5152833Z graph_break [] 2025-12-04T10:11:57.5153318Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml - 2025-12-04T10:11:57.5153424Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5154723Z FAILED [0.4986s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5154729Z 2025-12-04T10:11:57.5154857Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5155371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5155374Z 2025-12-04T10:11:57.5155534Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5155636Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5155749Z ================== 1 failed, 57 deselected, 2 rerun in 12.12s ================== 2025-12-04T10:11:57.5155824Z Got exit code 1 2025-12-04T10:11:57.5155891Z Retrying single test... 2025-12-04T10:11:57.5156160Z W1204 09:34:15.555000 39602 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5156546Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml 2025-12-04T10:11:57.5156639Z ============================= test session starts ============================== 2025-12-04T10:11:57.5156849Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5156915Z cachedir: .pytest_cache 2025-12-04T10:11:57.5157294Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5157375Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5157440Z configfile: pytest.ini 2025-12-04T10:11:57.5157824Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5157949Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5158518Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5158595Z Running 1 items in this shard 2025-12-04T10:11:57.5158599Z 2025-12-04T10:11:57.5159343Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:17.144401272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5159346Z 2025-12-04T10:11:57.5159649Z [W1204 09:34:26.278662521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5159656Z 2025-12-04T10:11:57.5160004Z [W1204 09:34:26.278929776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5160008Z 2025-12-04T10:11:57.5160307Z [W1204 09:34:26.285086872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5160310Z 2025-12-04T10:11:57.5160597Z [W1204 09:34:26.285711462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5160600Z 2025-12-04T10:11:57.5160894Z [W1204 09:34:26.285899886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5160898Z 2025-12-04T10:11:57.5161186Z [W1204 09:34:26.291361730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5161192Z 2025-12-04T10:11:57.5161479Z [W1204 09:34:26.291891459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5161485Z 2025-12-04T10:11:57.5161785Z [W1204 09:34:26.292048902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5161788Z 2025-12-04T10:11:57.5161868Z ('RERUN', {'yellow': True}) [11.0728s] [100%] 2025-12-04T10:11:57.5162598Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:27.084410851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5162602Z 2025-12-04T10:11:57.5162892Z [W1204 09:34:27.084933660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5162898Z 2025-12-04T10:11:57.5163190Z [W1204 09:34:27.085072762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5163193Z 2025-12-04T10:11:57.5163478Z [W1204 09:34:27.087915501 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5163480Z 2025-12-04T10:11:57.5163770Z [W1204 09:34:27.088369468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5163773Z 2025-12-04T10:11:57.5164141Z [W1204 09:34:27.088506801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5164148Z 2025-12-04T10:11:57.5164440Z [W1204 09:34:27.093030858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5164527Z 2025-12-04T10:11:57.5164820Z [W1204 09:34:27.093485156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5164825Z 2025-12-04T10:11:57.5165119Z [W1204 09:34:27.093621158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5165126Z 2025-12-04T10:11:57.5165202Z ('RERUN', {'yellow': True}) [0.4927s] [100%] 2025-12-04T10:11:57.5165920Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:27.574981942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5165924Z 2025-12-04T10:11:57.5166217Z [W1204 09:34:27.575529791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5166224Z 2025-12-04T10:11:57.5166509Z [W1204 09:34:27.575669134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5166513Z 2025-12-04T10:11:57.5166804Z [W1204 09:34:27.578544153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5166807Z 2025-12-04T10:11:57.5167095Z [W1204 09:34:27.578988180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5167097Z 2025-12-04T10:11:57.5167393Z [W1204 09:34:27.579125523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5167396Z 2025-12-04T10:11:57.5167681Z [W1204 09:34:27.583512698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5167687Z 2025-12-04T10:11:57.5167973Z [W1204 09:34:27.583967825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5167980Z 2025-12-04T10:11:57.5168267Z [W1204 09:34:27.584105988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5168270Z 2025-12-04T10:11:57.5168331Z FAILED [0.4872s] [100%] 2025-12-04T10:11:57.5168334Z 2025-12-04T10:11:57.5168422Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5168714Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5168790Z Traceback (most recent call last): 2025-12-04T10:11:57.5169095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5169163Z method(*args, **kwargs) 2025-12-04T10:11:57.5169460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5169522Z method(*args, **kwargs) 2025-12-04T10:11:57.5169809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5169883Z with policy(): 2025-12-04T10:11:57.5170179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5170248Z raise RuntimeError(msg) 2025-12-04T10:11:57.5171112Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5171181Z 2025-12-04T10:11:57.5171319Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5171837Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5171841Z 2025-12-04T10:11:57.5172005Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5172129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5172221Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5172771Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5172898Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5172962Z graph_break [] 2025-12-04T10:11:57.5173085Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5173781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5173856Z if out == self.unknown_value: 2025-12-04T10:11:57.5174143Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5174221Z Traceback (most recent call last): 2025-12-04T10:11:57.5174518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5174592Z method(*args, **kwargs) 2025-12-04T10:11:57.5174888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5174953Z method(*args, **kwargs) 2025-12-04T10:11:57.5175240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5175306Z with policy(): 2025-12-04T10:11:57.5175600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5175668Z raise RuntimeError(msg) 2025-12-04T10:11:57.5176473Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5176477Z 2025-12-04T10:11:57.5176600Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5177126Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5177130Z 2025-12-04T10:11:57.5177286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5177413Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5177504Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5178118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5178247Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5178305Z graph_break [] 2025-12-04T10:11:57.5178499Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5179185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5179255Z if out == self.unknown_value: 2025-12-04T10:11:57.5179379Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5179468Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5179592Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5180131Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5180191Z graph_break [] 2025-12-04T10:11:57.5180277Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5180564Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5180639Z Traceback (most recent call last): 2025-12-04T10:11:57.5180935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5180997Z method(*args, **kwargs) 2025-12-04T10:11:57.5181290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5181353Z method(*args, **kwargs) 2025-12-04T10:11:57.5181643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5181707Z with policy(): 2025-12-04T10:11:57.5181998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5182071Z raise RuntimeError(msg) 2025-12-04T10:11:57.5182885Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5182890Z 2025-12-04T10:11:57.5183013Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5183536Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5183540Z 2025-12-04T10:11:57.5183692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5183819Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5183908Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5184465Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5184594Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5184653Z graph_break [] 2025-12-04T10:11:57.5184778Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5185537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5185672Z if out == self.unknown_value: 2025-12-04T10:11:57.5185798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5185885Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5186010Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5186550Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5186607Z graph_break [] 2025-12-04T10:11:57.5186731Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5186820Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5186943Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5187477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5187536Z graph_break [] 2025-12-04T10:11:57.5188027Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml - 2025-12-04T10:11:57.5188126Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5189416Z FAILED [0.4872s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5189422Z 2025-12-04T10:11:57.5189547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5190065Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5190068Z 2025-12-04T10:11:57.5190223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5190324Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5190442Z ================== 1 failed, 57 deselected, 2 rerun in 12.08s ================== 2025-12-04T10:11:57.5190502Z Got exit code 1 2025-12-04T10:11:57.5190981Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5191227Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5191488Z W1204 09:34:34.190000 39788 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5191874Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml 2025-12-04T10:11:57.5191967Z ============================= test session starts ============================== 2025-12-04T10:11:57.5192177Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5192315Z cachedir: .pytest_cache 2025-12-04T10:11:57.5192622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5192702Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5192768Z configfile: pytest.ini 2025-12-04T10:11:57.5193179Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5193308Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T10:11:57.5193397Z stepcurrent: skipping 10 already run items. 2025-12-04T10:11:57.5193471Z Running 48 items in this shard 2025-12-04T10:11:57.5193475Z 2025-12-04T10:11:57.5193979Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9162s] [ 2%] 2025-12-04T10:11:57.5194475Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4936s] [ 2%] 2025-12-04T10:11:57.5194932Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4892s] [ 2%] 2025-12-04T10:11:57.5194938Z 2025-12-04T10:11:57.5195019Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5195319Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5195394Z Traceback (most recent call last): 2025-12-04T10:11:57.5195697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5195768Z method(*args, **kwargs) 2025-12-04T10:11:57.5196061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5196128Z method(*args, **kwargs) 2025-12-04T10:11:57.5196419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5196481Z with policy(): 2025-12-04T10:11:57.5196781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5196846Z raise RuntimeError(msg) 2025-12-04T10:11:57.5197656Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5197660Z 2025-12-04T10:11:57.5197788Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5198306Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5198318Z 2025-12-04T10:11:57.5198478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5198602Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5198699Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5199048Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5199176Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5199238Z graph_break [] 2025-12-04T10:11:57.5199604Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5199683Z Traceback (most recent call last): 2025-12-04T10:11:57.5200038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5200169Z method(*args, **kwargs) 2025-12-04T10:11:57.5200464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5200525Z method(*args, **kwargs) 2025-12-04T10:11:57.5200812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5200876Z with policy(): 2025-12-04T10:11:57.5201170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5201238Z raise RuntimeError(msg) 2025-12-04T10:11:57.5202061Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5202068Z 2025-12-04T10:11:57.5202190Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5202712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5202716Z 2025-12-04T10:11:57.5202885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5203019Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5203110Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5203463Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5203587Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5203648Z graph_break [] 2025-12-04T10:11:57.5203774Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5203861Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5203980Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5204328Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5204386Z graph_break [] 2025-12-04T10:11:57.5204472Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5204778Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5204852Z Traceback (most recent call last): 2025-12-04T10:11:57.5205155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5205219Z method(*args, **kwargs) 2025-12-04T10:11:57.5205511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5205578Z method(*args, **kwargs) 2025-12-04T10:11:57.5205866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5205929Z with policy(): 2025-12-04T10:11:57.5206223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5206286Z raise RuntimeError(msg) 2025-12-04T10:11:57.5207184Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5207251Z 2025-12-04T10:11:57.5207376Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5207905Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5207908Z 2025-12-04T10:11:57.5208069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5208193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5208302Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5208652Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5208781Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5208848Z graph_break [] 2025-12-04T10:11:57.5208969Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5209062Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5209182Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5209526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5209583Z graph_break [] 2025-12-04T10:11:57.5209705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5209800Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5209924Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5210260Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5210324Z graph_break [] 2025-12-04T10:11:57.5210809Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml - 2025-12-04T10:11:57.5210912Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5212217Z FAILED [0.4892s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5212222Z 2025-12-04T10:11:57.5212348Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5212872Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5212876Z 2025-12-04T10:11:57.5213029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5213139Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5213252Z ================== 1 failed, 10 deselected, 2 rerun in 2.92s =================== 2025-12-04T10:11:57.5213315Z Got exit code 1 2025-12-04T10:11:57.5213390Z Retrying single test... 2025-12-04T10:11:57.5213727Z W1204 09:34:43.849000 39976 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5214124Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml 2025-12-04T10:11:57.5214357Z ============================= test session starts ============================== 2025-12-04T10:11:57.5214572Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5214638Z cachedir: .pytest_cache 2025-12-04T10:11:57.5214947Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5215026Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5215090Z configfile: pytest.ini 2025-12-04T10:11:57.5215410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5215544Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5216123Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5216203Z Running 1 items in this shard 2025-12-04T10:11:57.5216207Z 2025-12-04T10:11:57.5216940Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:34:45.167859200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5216944Z 2025-12-04T10:11:57.5217401Z [W1204 09:34:54.130987437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5217406Z 2025-12-04T10:11:57.5217703Z [W1204 09:34:54.131248622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5217706Z 2025-12-04T10:11:57.5217995Z [W1204 09:34:54.137133722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5218005Z 2025-12-04T10:11:57.5218291Z [W1204 09:34:54.137729203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5218294Z 2025-12-04T10:11:57.5218581Z [W1204 09:34:54.137899585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5218584Z 2025-12-04T10:11:57.5218872Z [W1204 09:34:54.143390450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5218876Z 2025-12-04T10:11:57.5219162Z [W1204 09:34:54.143923449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5219165Z 2025-12-04T10:11:57.5219456Z [W1204 09:34:54.144082041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5219462Z 2025-12-04T10:11:57.5219542Z ('RERUN', {'yellow': True}) [10.9045s] [100%] 2025-12-04T10:11:57.5220279Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:34:55.173806576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5220284Z 2025-12-04T10:11:57.5220572Z [W1204 09:34:55.174334455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5220575Z 2025-12-04T10:11:57.5221003Z [W1204 09:34:55.174478037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5221008Z 2025-12-04T10:11:57.5221301Z [W1204 09:34:55.177397707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5221393Z 2025-12-04T10:11:57.5221684Z [W1204 09:34:55.177960356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5221687Z 2025-12-04T10:11:57.5221976Z [W1204 09:34:55.178098729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5221979Z 2025-12-04T10:11:57.5222271Z [W1204 09:34:55.182648047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5222274Z 2025-12-04T10:11:57.5222566Z [W1204 09:34:55.183113265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5222570Z 2025-12-04T10:11:57.5222858Z [W1204 09:34:55.183255018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5222864Z 2025-12-04T10:11:57.5222946Z ('RERUN', {'yellow': True}) [0.4514s] [100%] 2025-12-04T10:11:57.5223672Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:34:55.622542123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5223676Z 2025-12-04T10:11:57.5223967Z [W1204 09:34:55.623061772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5223971Z 2025-12-04T10:11:57.5224259Z [W1204 09:34:55.623200325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5224262Z 2025-12-04T10:11:57.5224550Z [W1204 09:34:55.626146935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5224561Z 2025-12-04T10:11:57.5224848Z [W1204 09:34:55.626706394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5224851Z 2025-12-04T10:11:57.5225138Z [W1204 09:34:55.626851467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5225141Z 2025-12-04T10:11:57.5225435Z [W1204 09:34:55.631423085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5225438Z 2025-12-04T10:11:57.5225728Z [W1204 09:34:55.631888363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5225730Z 2025-12-04T10:11:57.5226022Z [W1204 09:34:55.632026416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5226029Z 2025-12-04T10:11:57.5226091Z FAILED [0.4485s] [100%] 2025-12-04T10:11:57.5226094Z 2025-12-04T10:11:57.5226178Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5226478Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5226551Z Traceback (most recent call last): 2025-12-04T10:11:57.5226862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5226928Z method(*args, **kwargs) 2025-12-04T10:11:57.5227289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5227357Z method(*args, **kwargs) 2025-12-04T10:11:57.5227646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5227776Z with policy(): 2025-12-04T10:11:57.5228070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5228134Z raise RuntimeError(msg) 2025-12-04T10:11:57.5228953Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5228957Z 2025-12-04T10:11:57.5229084Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5229610Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5229616Z 2025-12-04T10:11:57.5229775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5229906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5230001Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5230353Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5230483Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5230541Z graph_break [] 2025-12-04T10:11:57.5230662Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5231363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5231436Z if out == self.unknown_value: 2025-12-04T10:11:57.5231743Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5231820Z Traceback (most recent call last): 2025-12-04T10:11:57.5232118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5232190Z method(*args, **kwargs) 2025-12-04T10:11:57.5232481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5232548Z method(*args, **kwargs) 2025-12-04T10:11:57.5232840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5232898Z with policy(): 2025-12-04T10:11:57.5233198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5233269Z raise RuntimeError(msg) 2025-12-04T10:11:57.5234097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5234106Z 2025-12-04T10:11:57.5234229Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5234825Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5234830Z 2025-12-04T10:11:57.5234991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5235118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5235280Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5235625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5235749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5235813Z graph_break [] 2025-12-04T10:11:57.5235935Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5236633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5236704Z if out == self.unknown_value: 2025-12-04T10:11:57.5236825Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5236921Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5237045Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5237389Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5237454Z graph_break [] 2025-12-04T10:11:57.5237537Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5237837Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5237913Z Traceback (most recent call last): 2025-12-04T10:11:57.5238208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5238275Z method(*args, **kwargs) 2025-12-04T10:11:57.5238568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5238632Z method(*args, **kwargs) 2025-12-04T10:11:57.5238925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5238995Z with policy(): 2025-12-04T10:11:57.5239302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5239366Z raise RuntimeError(msg) 2025-12-04T10:11:57.5240231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5240240Z 2025-12-04T10:11:57.5240362Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5240885Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5240889Z 2025-12-04T10:11:57.5241051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5241174Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5241270Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5241712Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5241841Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5241904Z graph_break [] 2025-12-04T10:11:57.5242037Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5242812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5242886Z if out == self.unknown_value: 2025-12-04T10:11:57.5243010Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5243103Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5243225Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5243568Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5243631Z graph_break [] 2025-12-04T10:11:57.5243751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5243848Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5243967Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5244306Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5244366Z graph_break [] 2025-12-04T10:11:57.5244851Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml - 2025-12-04T10:11:57.5244950Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5246264Z FAILED [0.4485s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5246271Z 2025-12-04T10:11:57.5246392Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5246918Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5246921Z 2025-12-04T10:11:57.5247080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5247185Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5247305Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ================== 2025-12-04T10:11:57.5247361Z Got exit code 1 2025-12-04T10:11:57.5247433Z Retrying single test... 2025-12-04T10:11:57.5247696Z W1204 09:35:02.239000 40169 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5248086Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml 2025-12-04T10:11:57.5248180Z ============================= test session starts ============================== 2025-12-04T10:11:57.5248390Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5248459Z cachedir: .pytest_cache 2025-12-04T10:11:57.5248850Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5248932Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5248997Z configfile: pytest.ini 2025-12-04T10:11:57.5249308Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5249507Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5250078Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5250148Z Running 1 items in this shard 2025-12-04T10:11:57.5250155Z 2025-12-04T10:11:57.5250895Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:35:03.538075638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5250899Z 2025-12-04T10:11:57.5251194Z [W1204 09:35:12.307782340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5251200Z 2025-12-04T10:11:57.5251494Z [W1204 09:35:12.308038615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5251497Z 2025-12-04T10:11:57.5251788Z [W1204 09:35:12.313855174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5251791Z 2025-12-04T10:11:57.5252084Z [W1204 09:35:12.314457804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5252087Z 2025-12-04T10:11:57.5252387Z [W1204 09:35:12.314628887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5252390Z 2025-12-04T10:11:57.5252683Z [W1204 09:35:12.320061900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5252689Z 2025-12-04T10:11:57.5252979Z [W1204 09:35:12.320602359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5252982Z 2025-12-04T10:11:57.5253269Z [W1204 09:35:12.320773092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5253272Z 2025-12-04T10:11:57.5253351Z ('RERUN', {'yellow': True}) [10.6977s] [100%] 2025-12-04T10:11:57.5254081Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:35:13.354008733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5254091Z 2025-12-04T10:11:57.5254380Z [W1204 09:35:13.354529081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5254386Z 2025-12-04T10:11:57.5254675Z [W1204 09:35:13.354673814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5254678Z 2025-12-04T10:11:57.5254974Z [W1204 09:35:13.357532083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5254977Z 2025-12-04T10:11:57.5255266Z [W1204 09:35:13.358082982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5255269Z 2025-12-04T10:11:57.5255633Z [W1204 09:35:13.358221654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5255637Z 2025-12-04T10:11:57.5255925Z [W1204 09:35:13.362670880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5255992Z 2025-12-04T10:11:57.5256283Z [W1204 09:35:13.363124668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5256286Z 2025-12-04T10:11:57.5256577Z [W1204 09:35:13.363263180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5256581Z 2025-12-04T10:11:57.5256672Z ('RERUN', {'yellow': True}) [0.4548s] [100%] 2025-12-04T10:11:57.5257400Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:35:13.804346209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5257405Z 2025-12-04T10:11:57.5257696Z [W1204 09:35:13.804866157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5257709Z 2025-12-04T10:11:57.5257998Z [W1204 09:35:13.805005810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5258001Z 2025-12-04T10:11:57.5258287Z [W1204 09:35:13.807848939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5258290Z 2025-12-04T10:11:57.5258581Z [W1204 09:35:13.808398578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5258584Z 2025-12-04T10:11:57.5258876Z [W1204 09:35:13.808537390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5258879Z 2025-12-04T10:11:57.5259170Z [W1204 09:35:13.812998707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5259173Z 2025-12-04T10:11:57.5259465Z [W1204 09:35:13.813454804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5259467Z 2025-12-04T10:11:57.5259760Z [W1204 09:35:13.813592766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5259763Z 2025-12-04T10:11:57.5259823Z FAILED [0.4494s] [100%] 2025-12-04T10:11:57.5259826Z 2025-12-04T10:11:57.5259908Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5260209Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5260285Z Traceback (most recent call last): 2025-12-04T10:11:57.5260593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5260659Z method(*args, **kwargs) 2025-12-04T10:11:57.5260960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5261032Z method(*args, **kwargs) 2025-12-04T10:11:57.5261320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5261381Z with policy(): 2025-12-04T10:11:57.5261679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5261745Z raise RuntimeError(msg) 2025-12-04T10:11:57.5262639Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5262644Z 2025-12-04T10:11:57.5262838Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5263367Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5263371Z 2025-12-04T10:11:57.5263530Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5263655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5263756Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5264108Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5264240Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5264300Z graph_break [] 2025-12-04T10:11:57.5264422Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5265117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5265187Z if out == self.unknown_value: 2025-12-04T10:11:57.5265483Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5265561Z Traceback (most recent call last): 2025-12-04T10:11:57.5265864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5265932Z method(*args, **kwargs) 2025-12-04T10:11:57.5266226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5266290Z method(*args, **kwargs) 2025-12-04T10:11:57.5266585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5266644Z with policy(): 2025-12-04T10:11:57.5266944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5267008Z raise RuntimeError(msg) 2025-12-04T10:11:57.5267826Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5267830Z 2025-12-04T10:11:57.5267957Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5268478Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5268484Z 2025-12-04T10:11:57.5268647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5268767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5268857Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5269210Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5269405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5269472Z graph_break [] 2025-12-04T10:11:57.5269604Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5270293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5270451Z if out == self.unknown_value: 2025-12-04T10:11:57.5270573Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5270667Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5270790Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5271131Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5271197Z graph_break [] 2025-12-04T10:11:57.5271280Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5271573Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5271652Z Traceback (most recent call last): 2025-12-04T10:11:57.5271953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5272020Z method(*args, **kwargs) 2025-12-04T10:11:57.5272310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5272372Z method(*args, **kwargs) 2025-12-04T10:11:57.5272662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5272721Z with policy(): 2025-12-04T10:11:57.5273014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5273084Z raise RuntimeError(msg) 2025-12-04T10:11:57.5273900Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5273906Z 2025-12-04T10:11:57.5274033Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5274553Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5274557Z 2025-12-04T10:11:57.5274719Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5274842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5274942Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5275290Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5275414Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5275476Z graph_break [] 2025-12-04T10:11:57.5275597Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5276283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5276357Z if out == self.unknown_value: 2025-12-04T10:11:57.5276547Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5276639Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5276765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5277168Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5277233Z graph_break [] 2025-12-04T10:11:57.5277356Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5277443Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5277569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5277904Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5277968Z graph_break [] 2025-12-04T10:11:57.5278455Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml - 2025-12-04T10:11:57.5278554Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5279854Z FAILED [0.4494s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5279858Z 2025-12-04T10:11:57.5280019Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5280550Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5280554Z 2025-12-04T10:11:57.5280711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5280816Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5280930Z ================== 1 failed, 57 deselected, 2 rerun in 11.62s ================== 2025-12-04T10:11:57.5280989Z Got exit code 1 2025-12-04T10:11:57.5281475Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5281723Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5281993Z W1204 09:35:20.448000 40362 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5282381Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml 2025-12-04T10:11:57.5282476Z ============================= test session starts ============================== 2025-12-04T10:11:57.5282694Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5282761Z cachedir: .pytest_cache 2025-12-04T10:11:57.5283065Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5283151Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5283218Z configfile: pytest.ini 2025-12-04T10:11:57.5283542Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5283738Z collecting ... collected 58 items / 11 deselected / 47 selected 2025-12-04T10:11:57.5283827Z stepcurrent: skipping 11 already run items. 2025-12-04T10:11:57.5283902Z Running 47 items in this shard 2025-12-04T10:11:57.5283906Z 2025-12-04T10:11:57.5284398Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9262s] [ 2%] 2025-12-04T10:11:57.5284955Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5400s] [ 2%] 2025-12-04T10:11:57.5285395Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.5357s] [ 2%] 2025-12-04T10:11:57.5285399Z 2025-12-04T10:11:57.5285482Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5285779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5285852Z Traceback (most recent call last): 2025-12-04T10:11:57.5286168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5286232Z method(*args, **kwargs) 2025-12-04T10:11:57.5286527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5286594Z method(*args, **kwargs) 2025-12-04T10:11:57.5286883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5286947Z with policy(): 2025-12-04T10:11:57.5287243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5287310Z raise RuntimeError(msg) 2025-12-04T10:11:57.5288108Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5288115Z 2025-12-04T10:11:57.5288238Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5288758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5288761Z 2025-12-04T10:11:57.5288917Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5289041Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5289139Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5289685Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5289816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5289874Z graph_break [] 2025-12-04T10:11:57.5290165Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5290244Z Traceback (most recent call last): 2025-12-04T10:11:57.5290542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5290609Z method(*args, **kwargs) 2025-12-04T10:11:57.5290972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5291036Z method(*args, **kwargs) 2025-12-04T10:11:57.5291329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5291452Z with policy(): 2025-12-04T10:11:57.5291750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5291819Z raise RuntimeError(msg) 2025-12-04T10:11:57.5292632Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5292637Z 2025-12-04T10:11:57.5292761Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5293278Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5293282Z 2025-12-04T10:11:57.5293443Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5293566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5293659Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5294202Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5294328Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5294391Z graph_break [] 2025-12-04T10:11:57.5294516Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5294605Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5294731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5295272Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5295333Z graph_break [] 2025-12-04T10:11:57.5295421Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5295710Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5295789Z Traceback (most recent call last): 2025-12-04T10:11:57.5296101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5296165Z method(*args, **kwargs) 2025-12-04T10:11:57.5296468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5296531Z method(*args, **kwargs) 2025-12-04T10:11:57.5296824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5296889Z with policy(): 2025-12-04T10:11:57.5297196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5297266Z raise RuntimeError(msg) 2025-12-04T10:11:57.5298154Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5298159Z 2025-12-04T10:11:57.5298288Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5298809Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5298894Z 2025-12-04T10:11:57.5299048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5299177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5299268Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5299811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5299937Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5299995Z graph_break [] 2025-12-04T10:11:57.5300124Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5300211Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5300334Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5300878Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5300939Z graph_break [] 2025-12-04T10:11:57.5301062Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5301148Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5301267Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5301807Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5301868Z graph_break [] 2025-12-04T10:11:57.5302363Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml - 2025-12-04T10:11:57.5302461Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5303748Z FAILED [0.5357s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5303761Z 2025-12-04T10:11:57.5303881Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5304407Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5304415Z 2025-12-04T10:11:57.5304573Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5304678Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5304798Z ================== 1 failed, 11 deselected, 2 rerun in 3.03s =================== 2025-12-04T10:11:57.5304857Z Got exit code 1 2025-12-04T10:11:57.5304921Z Retrying single test... 2025-12-04T10:11:57.5305258Z W1204 09:35:30.094000 40544 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5305646Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml 2025-12-04T10:11:57.5305808Z ============================= test session starts ============================== 2025-12-04T10:11:57.5306014Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5306081Z cachedir: .pytest_cache 2025-12-04T10:11:57.5306389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5306465Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5306530Z configfile: pytest.ini 2025-12-04T10:11:57.5306856Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5306987Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5307565Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5307637Z Running 1 items in this shard 2025-12-04T10:11:57.5307641Z 2025-12-04T10:11:57.5308367Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:31.685016588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5308376Z 2025-12-04T10:11:57.5308675Z [W1204 09:35:40.600890771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5308678Z 2025-12-04T10:11:57.5308972Z [W1204 09:35:40.601159026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5308975Z 2025-12-04T10:11:57.5309264Z [W1204 09:35:40.607469144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5309270Z 2025-12-04T10:11:57.5309557Z [W1204 09:35:40.608093154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5309560Z 2025-12-04T10:11:57.5309850Z [W1204 09:35:40.608280018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5309854Z 2025-12-04T10:11:57.5310144Z [W1204 09:35:40.613922034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5310148Z 2025-12-04T10:11:57.5310443Z [W1204 09:35:40.614473843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5310446Z 2025-12-04T10:11:57.5310734Z [W1204 09:35:40.614628826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5310740Z 2025-12-04T10:11:57.5310822Z ('RERUN', {'yellow': True}) [10.8677s] [100%] 2025-12-04T10:11:57.5311543Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:41.422313061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5311547Z 2025-12-04T10:11:57.5311836Z [W1204 09:35:41.422880661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5311845Z 2025-12-04T10:11:57.5312200Z [W1204 09:35:41.423019903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5312204Z 2025-12-04T10:11:57.5312491Z [W1204 09:35:41.426065886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5312558Z 2025-12-04T10:11:57.5312852Z [W1204 09:35:41.426522913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5312855Z 2025-12-04T10:11:57.5313144Z [W1204 09:35:41.426660676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5313147Z 2025-12-04T10:11:57.5313443Z [W1204 09:35:41.431423178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5313447Z 2025-12-04T10:11:57.5313740Z [W1204 09:35:41.431890946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5313743Z 2025-12-04T10:11:57.5314037Z [W1204 09:35:41.432027898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5314043Z 2025-12-04T10:11:57.5314121Z ('RERUN', {'yellow': True}) [0.5030s] [100%] 2025-12-04T10:11:57.5314841Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:41.924569509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5314848Z 2025-12-04T10:11:57.5315147Z [W1204 09:35:41.925127528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5315151Z 2025-12-04T10:11:57.5315441Z [W1204 09:35:41.925265421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5315444Z 2025-12-04T10:11:57.5315738Z [W1204 09:35:41.928304833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5315744Z 2025-12-04T10:11:57.5316032Z [W1204 09:35:41.928764361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5316035Z 2025-12-04T10:11:57.5316326Z [W1204 09:35:41.928900933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5316330Z 2025-12-04T10:11:57.5316614Z [W1204 09:35:41.933621874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5316617Z 2025-12-04T10:11:57.5316915Z [W1204 09:35:41.934096252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5316919Z 2025-12-04T10:11:57.5317357Z [W1204 09:35:41.934235395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5317361Z 2025-12-04T10:11:57.5317429Z FAILED [0.4964s] [100%] 2025-12-04T10:11:57.5317432Z 2025-12-04T10:11:57.5317514Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5317806Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5317887Z Traceback (most recent call last): 2025-12-04T10:11:57.5318194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5318260Z method(*args, **kwargs) 2025-12-04T10:11:57.5318679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5318749Z method(*args, **kwargs) 2025-12-04T10:11:57.5319044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5319103Z with policy(): 2025-12-04T10:11:57.5319488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5319560Z raise RuntimeError(msg) 2025-12-04T10:11:57.5320399Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5320404Z 2025-12-04T10:11:57.5320532Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5321051Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5321055Z 2025-12-04T10:11:57.5321217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5321348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5321441Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5321990Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5322120Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5322179Z graph_break [] 2025-12-04T10:11:57.5322319Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5323021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5323097Z if out == self.unknown_value: 2025-12-04T10:11:57.5323389Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5323462Z Traceback (most recent call last): 2025-12-04T10:11:57.5323764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5323828Z method(*args, **kwargs) 2025-12-04T10:11:57.5324126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5324189Z method(*args, **kwargs) 2025-12-04T10:11:57.5324481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5324545Z with policy(): 2025-12-04T10:11:57.5324842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5324910Z raise RuntimeError(msg) 2025-12-04T10:11:57.5325741Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5325745Z 2025-12-04T10:11:57.5325869Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5326487Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5326491Z 2025-12-04T10:11:57.5326650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5326779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5327015Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5327555Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5327696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5327755Z graph_break [] 2025-12-04T10:11:57.5327884Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5328570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5328641Z if out == self.unknown_value: 2025-12-04T10:11:57.5328769Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5328857Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5328978Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5329520Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5329581Z graph_break [] 2025-12-04T10:11:57.5329667Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5329958Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5330035Z Traceback (most recent call last): 2025-12-04T10:11:57.5330340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5330406Z method(*args, **kwargs) 2025-12-04T10:11:57.5330700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5330761Z method(*args, **kwargs) 2025-12-04T10:11:57.5331055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5331119Z with policy(): 2025-12-04T10:11:57.5331410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5331476Z raise RuntimeError(msg) 2025-12-04T10:11:57.5332305Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5332311Z 2025-12-04T10:11:57.5332435Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5332953Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5332957Z 2025-12-04T10:11:57.5333110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5333237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5333326Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5333937Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5334130Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5334188Z graph_break [] 2025-12-04T10:11:57.5334314Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5335002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5335070Z if out == self.unknown_value: 2025-12-04T10:11:57.5335194Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5335287Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5335412Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5335952Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5336012Z graph_break [] 2025-12-04T10:11:57.5336138Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5336225Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5336345Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5337082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5337149Z graph_break [] 2025-12-04T10:11:57.5337650Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml - 2025-12-04T10:11:57.5337750Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5339056Z FAILED [0.4964s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5339061Z 2025-12-04T10:11:57.5339189Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5339715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5339723Z 2025-12-04T10:11:57.5339888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5339993Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5340115Z ================== 1 failed, 57 deselected, 2 rerun in 11.89s ================== 2025-12-04T10:11:57.5340175Z Got exit code 1 2025-12-04T10:11:57.5340240Z Retrying single test... 2025-12-04T10:11:57.5340509Z W1204 09:35:48.533000 40731 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5340898Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml 2025-12-04T10:11:57.5341081Z ============================= test session starts ============================== 2025-12-04T10:11:57.5341291Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5341357Z cachedir: .pytest_cache 2025-12-04T10:11:57.5341735Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5341811Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5341877Z configfile: pytest.ini 2025-12-04T10:11:57.5342196Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5342323Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5342897Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5342969Z Running 1 items in this shard 2025-12-04T10:11:57.5342973Z 2025-12-04T10:11:57.5343704Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:50.120608693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5343711Z 2025-12-04T10:11:57.5344007Z [W1204 09:35:59.127030871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5344011Z 2025-12-04T10:11:57.5344302Z [W1204 09:35:59.127297296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5344309Z 2025-12-04T10:11:57.5344601Z [W1204 09:35:59.133262728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5344605Z 2025-12-04T10:11:57.5344891Z [W1204 09:35:59.133886979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5344894Z 2025-12-04T10:11:57.5345201Z [W1204 09:35:59.134068632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5345205Z 2025-12-04T10:11:57.5345495Z [W1204 09:35:59.139482454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5345498Z 2025-12-04T10:11:57.5345792Z [W1204 09:35:59.140031724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5345795Z 2025-12-04T10:11:57.5346084Z [W1204 09:35:59.140198507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5346090Z 2025-12-04T10:11:57.5346174Z ('RERUN', {'yellow': True}) [10.9491s] [100%] 2025-12-04T10:11:57.5346897Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:59.939979529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5346904Z 2025-12-04T10:11:57.5347196Z [W1204 09:36:00.940560889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5347199Z 2025-12-04T10:11:57.5347489Z [W1204 09:36:00.940709051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5347492Z 2025-12-04T10:11:57.5347859Z [W1204 09:36:00.943701593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5347864Z 2025-12-04T10:11:57.5348166Z [W1204 09:36:00.944156361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5348169Z 2025-12-04T10:11:57.5348458Z [W1204 09:36:00.944320604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5348525Z 2025-12-04T10:11:57.5348820Z [W1204 09:36:00.948920463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5348823Z 2025-12-04T10:11:57.5349113Z [W1204 09:36:00.949373111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5349116Z 2025-12-04T10:11:57.5349409Z [W1204 09:36:00.949508043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5349415Z 2025-12-04T10:11:57.5349493Z ('RERUN', {'yellow': True}) [0.5006s] [100%] 2025-12-04T10:11:57.5350217Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:36:00.440240630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5350223Z 2025-12-04T10:11:57.5350513Z [W1204 09:36:00.440785830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5350517Z 2025-12-04T10:11:57.5350806Z [W1204 09:36:00.440922802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5350815Z 2025-12-04T10:11:57.5351104Z [W1204 09:36:00.443779301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5351111Z 2025-12-04T10:11:57.5351398Z [W1204 09:36:00.444222179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5351402Z 2025-12-04T10:11:57.5351695Z [W1204 09:36:00.444366281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5351700Z 2025-12-04T10:11:57.5351987Z [W1204 09:36:00.448867709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5351990Z 2025-12-04T10:11:57.5352289Z [W1204 09:36:00.449320157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5352292Z 2025-12-04T10:11:57.5352578Z [W1204 09:36:00.449453859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5352581Z 2025-12-04T10:11:57.5352649Z FAILED [0.4991s] [100%] 2025-12-04T10:11:57.5352652Z 2025-12-04T10:11:57.5352740Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5353034Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5353120Z Traceback (most recent call last): 2025-12-04T10:11:57.5353426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5353496Z method(*args, **kwargs) 2025-12-04T10:11:57.5353791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5353853Z method(*args, **kwargs) 2025-12-04T10:11:57.5354148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5354210Z with policy(): 2025-12-04T10:11:57.5354600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5354676Z raise RuntimeError(msg) 2025-12-04T10:11:57.5355474Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5355544Z 2025-12-04T10:11:57.5355677Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5356198Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5356202Z 2025-12-04T10:11:57.5356370Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5356496Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5356590Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5357137Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5357269Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5357334Z graph_break [] 2025-12-04T10:11:57.5357454Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5358153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5358227Z if out == self.unknown_value: 2025-12-04T10:11:57.5358520Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5358596Z Traceback (most recent call last): 2025-12-04T10:11:57.5358895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5358956Z method(*args, **kwargs) 2025-12-04T10:11:57.5359250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5359312Z method(*args, **kwargs) 2025-12-04T10:11:57.5359601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5359665Z with policy(): 2025-12-04T10:11:57.5360041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5360120Z raise RuntimeError(msg) 2025-12-04T10:11:57.5360932Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5360938Z 2025-12-04T10:11:57.5361065Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5361586Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5361590Z 2025-12-04T10:11:57.5361747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5361948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5362045Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5362586Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5362782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5362841Z graph_break [] 2025-12-04T10:11:57.5362967Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5363656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5363727Z if out == self.unknown_value: 2025-12-04T10:11:57.5363856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5363947Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5364084Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5364629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5364689Z graph_break [] 2025-12-04T10:11:57.5364776Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5365069Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5365146Z Traceback (most recent call last): 2025-12-04T10:11:57.5365449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5365513Z method(*args, **kwargs) 2025-12-04T10:11:57.5365810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5365875Z method(*args, **kwargs) 2025-12-04T10:11:57.5366164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5366229Z with policy(): 2025-12-04T10:11:57.5366524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5366594Z raise RuntimeError(msg) 2025-12-04T10:11:57.5367412Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5367415Z 2025-12-04T10:11:57.5367540Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5368061Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5368067Z 2025-12-04T10:11:57.5368229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5368357Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5368449Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5368992Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5369193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5369259Z graph_break [] 2025-12-04T10:11:57.5369386Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5370138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5370209Z if out == self.unknown_value: 2025-12-04T10:11:57.5370337Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5370429Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5370554Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5371095Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5371153Z graph_break [] 2025-12-04T10:11:57.5371284Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5371375Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5371500Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5372041Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5372099Z graph_break [] 2025-12-04T10:11:57.5372603Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml - 2025-12-04T10:11:57.5372704Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5374003Z FAILED [0.4991s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5374010Z 2025-12-04T10:11:57.5374132Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5374655Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5374658Z 2025-12-04T10:11:57.5374818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5374923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5375045Z ================== 1 failed, 57 deselected, 2 rerun in 11.97s ================== 2025-12-04T10:11:57.5375106Z Got exit code 1 2025-12-04T10:11:57.5375583Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5375829Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5376093Z W1204 09:36:07.052000 40918 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5376559Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml 2025-12-04T10:11:57.5376656Z ============================= test session starts ============================== 2025-12-04T10:11:57.5376869Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5377022Z cachedir: .pytest_cache 2025-12-04T10:11:57.5377334Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5377420Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5377486Z configfile: pytest.ini 2025-12-04T10:11:57.5377815Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5377950Z collecting ... collected 58 items / 12 deselected / 46 selected 2025-12-04T10:11:57.5378040Z stepcurrent: skipping 12 already run items. 2025-12-04T10:11:57.5378120Z Running 46 items in this shard 2025-12-04T10:11:57.5378124Z 2025-12-04T10:11:57.5378624Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9332s] [ 2%] 2025-12-04T10:11:57.5379109Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5371s] [ 2%] 2025-12-04T10:11:57.5379563Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5333s] [ 2%] 2025-12-04T10:11:57.5379567Z 2025-12-04T10:11:57.5379652Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5379947Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5380021Z Traceback (most recent call last): 2025-12-04T10:11:57.5380327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5380399Z method(*args, **kwargs) 2025-12-04T10:11:57.5380695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5380765Z method(*args, **kwargs) 2025-12-04T10:11:57.5381055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5381117Z with policy(): 2025-12-04T10:11:57.5381426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5381491Z raise RuntimeError(msg) 2025-12-04T10:11:57.5382301Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5382305Z 2025-12-04T10:11:57.5382433Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5382951Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5382955Z 2025-12-04T10:11:57.5383121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5383249Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5383345Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5383960Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5384092Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5384158Z graph_break [] 2025-12-04T10:11:57.5384448Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5384596Z Traceback (most recent call last): 2025-12-04T10:11:57.5384904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5384968Z method(*args, **kwargs) 2025-12-04T10:11:57.5385263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5385324Z method(*args, **kwargs) 2025-12-04T10:11:57.5385617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5385681Z with policy(): 2025-12-04T10:11:57.5385977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5386048Z raise RuntimeError(msg) 2025-12-04T10:11:57.5386861Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5386866Z 2025-12-04T10:11:57.5386992Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5387509Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5387512Z 2025-12-04T10:11:57.5387682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5387813Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5387905Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5388454Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5388582Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5388640Z graph_break [] 2025-12-04T10:11:57.5388768Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5388857Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5388977Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5389522Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5389583Z graph_break [] 2025-12-04T10:11:57.5389672Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5389961Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5390032Z Traceback (most recent call last): 2025-12-04T10:11:57.5390349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5390414Z method(*args, **kwargs) 2025-12-04T10:11:57.5390715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5390851Z method(*args, **kwargs) 2025-12-04T10:11:57.5391144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5391208Z with policy(): 2025-12-04T10:11:57.5391504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5391635Z raise RuntimeError(msg) 2025-12-04T10:11:57.5392455Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5392460Z 2025-12-04T10:11:57.5392583Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5393111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5393114Z 2025-12-04T10:11:57.5393270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5393401Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5393493Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5394033Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5394165Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5394223Z graph_break [] 2025-12-04T10:11:57.5394346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5394443Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5394574Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5395120Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5395183Z graph_break [] 2025-12-04T10:11:57.5395304Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5395401Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5395522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5396064Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5396125Z graph_break [] 2025-12-04T10:11:57.5396621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml - 2025-12-04T10:11:57.5396730Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5398026Z FAILED [0.5333s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5398031Z 2025-12-04T10:11:57.5398172Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5398766Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5398770Z 2025-12-04T10:11:57.5398996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5399101Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5399217Z ================== 1 failed, 12 deselected, 2 rerun in 3.03s =================== 2025-12-04T10:11:57.5399285Z Got exit code 1 2025-12-04T10:11:57.5399350Z Retrying single test... 2025-12-04T10:11:57.5399618Z W1204 09:36:16.688000 41100 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5400049Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml 2025-12-04T10:11:57.5400157Z ============================= test session starts ============================== 2025-12-04T10:11:57.5400377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5400446Z cachedir: .pytest_cache 2025-12-04T10:11:57.5400759Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5400843Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5400908Z configfile: pytest.ini 2025-12-04T10:11:57.5401232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5401363Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5401935Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5402019Z Running 1 items in this shard 2025-12-04T10:11:57.5402024Z 2025-12-04T10:11:57.5402757Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:18.282281481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5402764Z 2025-12-04T10:11:57.5403071Z [W1204 09:36:27.257591219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5403075Z 2025-12-04T10:11:57.5403368Z [W1204 09:36:27.257856694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5403372Z 2025-12-04T10:11:57.5403671Z [W1204 09:36:27.264629780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5403675Z 2025-12-04T10:11:57.5403966Z [W1204 09:36:27.265245591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5403970Z 2025-12-04T10:11:57.5404271Z [W1204 09:36:27.265431384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5404274Z 2025-12-04T10:11:57.5404565Z [W1204 09:36:27.270808317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5404568Z 2025-12-04T10:11:57.5404855Z [W1204 09:36:27.271338486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5404862Z 2025-12-04T10:11:57.5405224Z [W1204 09:36:27.271494948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5405228Z 2025-12-04T10:11:57.5405314Z ('RERUN', {'yellow': True}) [10.9230s] [100%] 2025-12-04T10:11:57.5406044Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:28.072147691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5406132Z 2025-12-04T10:11:57.5406422Z [W1204 09:36:28.072703111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5406426Z 2025-12-04T10:11:57.5406723Z [W1204 09:36:28.072844223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5406726Z 2025-12-04T10:11:57.5407018Z [W1204 09:36:28.075728663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5407021Z 2025-12-04T10:11:57.5407314Z [W1204 09:36:28.076175360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5407318Z 2025-12-04T10:11:57.5407610Z [W1204 09:36:28.076321353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5407613Z 2025-12-04T10:11:57.5407910Z [W1204 09:36:28.080888851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5407913Z 2025-12-04T10:11:57.5408210Z [W1204 09:36:28.081340929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5408213Z 2025-12-04T10:11:57.5408500Z [W1204 09:36:28.081477181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5408506Z 2025-12-04T10:11:57.5408602Z ('RERUN', {'yellow': True}) [0.5020s] [100%] 2025-12-04T10:11:57.5409323Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:28.572773688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5409329Z 2025-12-04T10:11:57.5409629Z [W1204 09:36:28.573326638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5409632Z 2025-12-04T10:11:57.5409918Z [W1204 09:36:28.573468460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5409921Z 2025-12-04T10:11:57.5410214Z [W1204 09:36:28.576310819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5410220Z 2025-12-04T10:11:57.5410507Z [W1204 09:36:28.576754366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5410510Z 2025-12-04T10:11:57.5410804Z [W1204 09:36:28.576889969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5410809Z 2025-12-04T10:11:57.5411097Z [W1204 09:36:28.581345515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5411100Z 2025-12-04T10:11:57.5411386Z [W1204 09:36:28.581800033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5411394Z 2025-12-04T10:11:57.5411680Z [W1204 09:36:28.581935845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5411683Z 2025-12-04T10:11:57.5411817Z FAILED [0.4986s] [100%] 2025-12-04T10:11:57.5411820Z 2025-12-04T10:11:57.5411913Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5412211Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5412359Z Traceback (most recent call last): 2025-12-04T10:11:57.5412668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5412734Z method(*args, **kwargs) 2025-12-04T10:11:57.5413033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5413100Z method(*args, **kwargs) 2025-12-04T10:11:57.5413391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5413459Z with policy(): 2025-12-04T10:11:57.5413754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5413826Z raise RuntimeError(msg) 2025-12-04T10:11:57.5414619Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5414626Z 2025-12-04T10:11:57.5414758Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5415286Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5415289Z 2025-12-04T10:11:57.5415455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5415588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5415685Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5416232Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5416375Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5416435Z graph_break [] 2025-12-04T10:11:57.5416566Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5417438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5417515Z if out == self.unknown_value: 2025-12-04T10:11:57.5417814Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5417897Z Traceback (most recent call last): 2025-12-04T10:11:57.5418205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5418268Z method(*args, **kwargs) 2025-12-04T10:11:57.5418560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5418628Z method(*args, **kwargs) 2025-12-04T10:11:57.5418919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5418979Z with policy(): 2025-12-04T10:11:57.5419395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5419464Z raise RuntimeError(msg) 2025-12-04T10:11:57.5420283Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5420375Z 2025-12-04T10:11:57.5420502Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5421026Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5421030Z 2025-12-04T10:11:57.5421191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5421318Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5421418Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5421962Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5422099Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5422160Z graph_break [] 2025-12-04T10:11:57.5422281Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5422981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5423052Z if out == self.unknown_value: 2025-12-04T10:11:57.5423183Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5423274Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5423398Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5423950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5424011Z graph_break [] 2025-12-04T10:11:57.5424093Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5424386Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5424459Z Traceback (most recent call last): 2025-12-04T10:11:57.5424769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5424833Z method(*args, **kwargs) 2025-12-04T10:11:57.5425129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5425201Z method(*args, **kwargs) 2025-12-04T10:11:57.5425490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5425561Z with policy(): 2025-12-04T10:11:57.5425856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5425920Z raise RuntimeError(msg) 2025-12-04T10:11:57.5426811Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5426815Z 2025-12-04T10:11:57.5426939Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5434555Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5434675Z 2025-12-04T10:11:57.5434863Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5435008Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5435111Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5435667Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5435806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5435867Z graph_break [] 2025-12-04T10:11:57.5436008Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5436720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5436795Z if out == self.unknown_value: 2025-12-04T10:11:57.5436931Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5437031Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5437163Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5437714Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5437774Z graph_break [] 2025-12-04T10:11:57.5437903Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5437997Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5438118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5438659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5438716Z graph_break [] 2025-12-04T10:11:57.5439223Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml - 2025-12-04T10:11:57.5439328Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5440707Z FAILED [0.4986s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5440716Z 2025-12-04T10:11:57.5440852Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5441392Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5441396Z 2025-12-04T10:11:57.5441653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5441765Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5441890Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.5442026Z Got exit code 1 2025-12-04T10:11:57.5442093Z Retrying single test... 2025-12-04T10:11:57.5442369Z W1204 09:36:35.239000 41287 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5442758Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml 2025-12-04T10:11:57.5442865Z ============================= test session starts ============================== 2025-12-04T10:11:57.5443077Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5443146Z cachedir: .pytest_cache 2025-12-04T10:11:57.5443464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5443542Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5443609Z configfile: pytest.ini 2025-12-04T10:11:57.5443932Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5444065Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5444639Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5444720Z Running 1 items in this shard 2025-12-04T10:11:57.5444724Z 2025-12-04T10:11:57.5445470Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:36.830662045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5445474Z 2025-12-04T10:11:57.5445781Z [W1204 09:36:46.951336829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5445787Z 2025-12-04T10:11:57.5446078Z [W1204 09:36:46.951595803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5446087Z 2025-12-04T10:11:57.5446375Z [W1204 09:36:46.957503694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5446379Z 2025-12-04T10:11:57.5446666Z [W1204 09:36:46.958091484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5446669Z 2025-12-04T10:11:57.5446964Z [W1204 09:36:46.958275557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5446967Z 2025-12-04T10:11:57.5447255Z [W1204 09:36:46.963729201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5447261Z 2025-12-04T10:11:57.5447555Z [W1204 09:36:46.964270250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5447558Z 2025-12-04T10:11:57.5447847Z [W1204 09:36:46.964443503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5447850Z 2025-12-04T10:11:57.5447935Z ('RERUN', {'yellow': True}) [11.0699s] [100%] 2025-12-04T10:11:57.5448804Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:46.770371191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5448809Z 2025-12-04T10:11:57.5449105Z [W1204 09:36:46.770925301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5449173Z 2025-12-04T10:11:57.5449461Z [W1204 09:36:46.771068763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5449464Z 2025-12-04T10:11:57.5449752Z [W1204 09:36:46.774001864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5449759Z 2025-12-04T10:11:57.5450046Z [W1204 09:36:46.774452242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5450050Z 2025-12-04T10:11:57.5450338Z [W1204 09:36:46.774589124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5450342Z 2025-12-04T10:11:57.5450636Z [W1204 09:36:46.779069492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5450642Z 2025-12-04T10:11:57.5450927Z [W1204 09:36:46.779521619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5450931Z 2025-12-04T10:11:57.5451225Z [W1204 09:36:46.779657171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5451228Z 2025-12-04T10:11:57.5451306Z ('RERUN', {'yellow': True}) [0.5027s] [100%] 2025-12-04T10:11:57.5452031Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:47.271747383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5452035Z 2025-12-04T10:11:57.5452323Z [W1204 09:36:47.272304953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5452329Z 2025-12-04T10:11:57.5452621Z [W1204 09:36:47.272444625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5452625Z 2025-12-04T10:11:57.5452911Z [W1204 09:36:47.275324004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5452914Z 2025-12-04T10:11:57.5453204Z [W1204 09:36:47.275774852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5453207Z 2025-12-04T10:11:57.5453503Z [W1204 09:36:47.275913354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5453506Z 2025-12-04T10:11:57.5453795Z [W1204 09:36:47.280457402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5453801Z 2025-12-04T10:11:57.5454095Z [W1204 09:36:47.280923550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5454098Z 2025-12-04T10:11:57.5454384Z [W1204 09:36:47.281059782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5454387Z 2025-12-04T10:11:57.5454453Z FAILED [0.4987s] [100%] 2025-12-04T10:11:57.5454456Z 2025-12-04T10:11:57.5454540Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5454907Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5454991Z Traceback (most recent call last): 2025-12-04T10:11:57.5455304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5455772Z method(*args, **kwargs) 2025-12-04T10:11:57.5456072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5456139Z method(*args, **kwargs) 2025-12-04T10:11:57.5456437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5456496Z with policy(): 2025-12-04T10:11:57.5456798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5456868Z raise RuntimeError(msg) 2025-12-04T10:11:57.5457671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5457678Z 2025-12-04T10:11:57.5457816Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5458336Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5458340Z 2025-12-04T10:11:57.5458504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5458635Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5458735Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5459289Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5459421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5459488Z graph_break [] 2025-12-04T10:11:57.5459613Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5460310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5460386Z if out == self.unknown_value: 2025-12-04T10:11:57.5460681Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5460763Z Traceback (most recent call last): 2025-12-04T10:11:57.5461063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5461129Z method(*args, **kwargs) 2025-12-04T10:11:57.5461430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5461494Z method(*args, **kwargs) 2025-12-04T10:11:57.5461785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5461850Z with policy(): 2025-12-04T10:11:57.5462157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5462232Z raise RuntimeError(msg) 2025-12-04T10:11:57.5463114Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5463119Z 2025-12-04T10:11:57.5463247Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5463835Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5463839Z 2025-12-04T10:11:57.5463997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5464126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5464220Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5464764Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5464895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5464956Z graph_break [] 2025-12-04T10:11:57.5465086Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5465776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5465858Z if out == self.unknown_value: 2025-12-04T10:11:57.5465991Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5466083Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5466211Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5466755Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5466816Z graph_break [] 2025-12-04T10:11:57.5466905Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5467192Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5467271Z Traceback (most recent call last): 2025-12-04T10:11:57.5467568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5467630Z method(*args, **kwargs) 2025-12-04T10:11:57.5467928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5467997Z method(*args, **kwargs) 2025-12-04T10:11:57.5468286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5468353Z with policy(): 2025-12-04T10:11:57.5468650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5468719Z raise RuntimeError(msg) 2025-12-04T10:11:57.5469529Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5469534Z 2025-12-04T10:11:57.5469663Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5470279Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5470283Z 2025-12-04T10:11:57.5470441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5470633Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5470724Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5471282Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5471408Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5471466Z graph_break [] 2025-12-04T10:11:57.5471593Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5472294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5472368Z if out == self.unknown_value: 2025-12-04T10:11:57.5472493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5472582Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5472713Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5473254Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5473311Z graph_break [] 2025-12-04T10:11:57.5473439Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5473527Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5473655Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5474191Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5474253Z graph_break [] 2025-12-04T10:11:57.5474745Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml - 2025-12-04T10:11:57.5474846Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5476137Z FAILED [0.4987s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5476144Z 2025-12-04T10:11:57.5476273Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5476795Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5476798Z 2025-12-04T10:11:57.5476956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5477059Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5477254Z ================== 1 failed, 57 deselected, 2 rerun in 12.10s ================== 2025-12-04T10:11:57.5477314Z Got exit code 1 2025-12-04T10:11:57.5477802Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5478116Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5478381Z W1204 09:36:53.864000 41474 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5478771Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml 2025-12-04T10:11:57.5478868Z ============================= test session starts ============================== 2025-12-04T10:11:57.5479080Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5479150Z cachedir: .pytest_cache 2025-12-04T10:11:57.5479455Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5479536Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5479606Z configfile: pytest.ini 2025-12-04T10:11:57.5479960Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5480097Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T10:11:57.5480186Z stepcurrent: skipping 13 already run items. 2025-12-04T10:11:57.5480259Z Running 45 items in this shard 2025-12-04T10:11:57.5480263Z 2025-12-04T10:11:57.5480766Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0000s] [ 2%] 2025-12-04T10:11:57.5481260Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6409s] [ 2%] 2025-12-04T10:11:57.5481721Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6613s] [ 2%] 2025-12-04T10:11:57.5481728Z 2025-12-04T10:11:57.5481812Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5482114Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5482188Z Traceback (most recent call last): 2025-12-04T10:11:57.5482495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5482567Z method(*args, **kwargs) 2025-12-04T10:11:57.5482865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5482933Z method(*args, **kwargs) 2025-12-04T10:11:57.5483221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5483284Z with policy(): 2025-12-04T10:11:57.5483587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5483653Z raise RuntimeError(msg) 2025-12-04T10:11:57.5484470Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5484474Z 2025-12-04T10:11:57.5484676Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5485203Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5485274Z 2025-12-04T10:11:57.5485434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5485561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5485660Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5486008Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5486134Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5486199Z graph_break [] 2025-12-04T10:11:57.5486506Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5486586Z Traceback (most recent call last): 2025-12-04T10:11:57.5486886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5486952Z method(*args, **kwargs) 2025-12-04T10:11:57.5487254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5487316Z method(*args, **kwargs) 2025-12-04T10:11:57.5487603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5487665Z with policy(): 2025-12-04T10:11:57.5487959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5488028Z raise RuntimeError(msg) 2025-12-04T10:11:57.5488854Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5488861Z 2025-12-04T10:11:57.5488988Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5489513Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5489517Z 2025-12-04T10:11:57.5489676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5489805Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5489898Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5490252Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5490377Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5490437Z graph_break [] 2025-12-04T10:11:57.5490565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5490653Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5490777Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5491121Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5491178Z graph_break [] 2025-12-04T10:11:57.5491268Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5491631Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5491705Z Traceback (most recent call last): 2025-12-04T10:11:57.5492020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5492166Z method(*args, **kwargs) 2025-12-04T10:11:57.5492458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5492524Z method(*args, **kwargs) 2025-12-04T10:11:57.5492814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5492881Z with policy(): 2025-12-04T10:11:57.5493177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5493242Z raise RuntimeError(msg) 2025-12-04T10:11:57.5494072Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5494078Z 2025-12-04T10:11:57.5494202Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5494727Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5494731Z 2025-12-04T10:11:57.5494885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5495007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5495104Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5495453Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5495591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5495654Z graph_break [] 2025-12-04T10:11:57.5495781Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5495873Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5495996Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5496340Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5496398Z graph_break [] 2025-12-04T10:11:57.5496522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5496618Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5496736Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5497071Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5497135Z graph_break [] 2025-12-04T10:11:57.5497621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml - 2025-12-04T10:11:57.5497724Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5499096Z FAILED [0.6613s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5499102Z 2025-12-04T10:11:57.5499296Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5499817Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5499821Z 2025-12-04T10:11:57.5499973Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5500095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5500214Z ================== 1 failed, 13 deselected, 2 rerun in 3.33s =================== 2025-12-04T10:11:57.5500279Z Got exit code 1 2025-12-04T10:11:57.5500346Z Retrying single test... 2025-12-04T10:11:57.5500607Z W1204 09:37:03.664000 41663 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5500998Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml 2025-12-04T10:11:57.5501095Z ============================= test session starts ============================== 2025-12-04T10:11:57.5501305Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5501370Z cachedir: .pytest_cache 2025-12-04T10:11:57.5501673Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5501752Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5501818Z configfile: pytest.ini 2025-12-04T10:11:57.5502133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5502268Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5502842Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5502920Z Running 1 items in this shard 2025-12-04T10:11:57.5502923Z 2025-12-04T10:11:57.5503659Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:04.890076577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5503663Z 2025-12-04T10:11:57.5503973Z [W1204 09:37:14.036697559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5503976Z 2025-12-04T10:11:57.5504270Z [W1204 09:37:14.036959223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5504274Z 2025-12-04T10:11:57.5504565Z [W1204 09:37:14.042801364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5504572Z 2025-12-04T10:11:57.5504861Z [W1204 09:37:14.043399834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5504865Z 2025-12-04T10:11:57.5505154Z [W1204 09:37:14.043572227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5505157Z 2025-12-04T10:11:57.5505450Z [W1204 09:37:14.049066391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5505523Z 2025-12-04T10:11:57.5505816Z [W1204 09:37:14.049604530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5505819Z 2025-12-04T10:11:57.5506112Z [W1204 09:37:14.049768083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5506180Z 2025-12-04T10:11:57.5506261Z ('RERUN', {'yellow': True}) [11.1798s] [100%] 2025-12-04T10:11:57.5506991Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:15.402162565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5506996Z 2025-12-04T10:11:57.5507286Z [W1204 09:37:15.402701894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5507292Z 2025-12-04T10:11:57.5507585Z [W1204 09:37:15.402842867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5507588Z 2025-12-04T10:11:57.5507878Z [W1204 09:37:15.405865299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5507884Z 2025-12-04T10:11:57.5508173Z [W1204 09:37:15.406433339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5508180Z 2025-12-04T10:11:57.5508466Z [W1204 09:37:15.406573911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5508469Z 2025-12-04T10:11:57.5508756Z [W1204 09:37:15.411175620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5508759Z 2025-12-04T10:11:57.5509052Z [W1204 09:37:15.411641638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5509055Z 2025-12-04T10:11:57.5509341Z [W1204 09:37:15.411779321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5509347Z 2025-12-04T10:11:57.5509431Z ('RERUN', {'yellow': True}) [0.5986s] [100%] 2025-12-04T10:11:57.5510156Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:16.999664076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5510160Z 2025-12-04T10:11:57.5510450Z [W1204 09:37:16.000219666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5510456Z 2025-12-04T10:11:57.5510740Z [W1204 09:37:16.000371108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5510744Z 2025-12-04T10:11:57.5511032Z [W1204 09:37:16.003387180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5511042Z 2025-12-04T10:11:57.5511333Z [W1204 09:37:16.003949710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5511336Z 2025-12-04T10:11:57.5511622Z [W1204 09:37:16.004088112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5511625Z 2025-12-04T10:11:57.5511920Z [W1204 09:37:16.008789742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5511923Z 2025-12-04T10:11:57.5512279Z [W1204 09:37:16.009251990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5512283Z 2025-12-04T10:11:57.5512577Z [W1204 09:37:16.009388732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5512645Z 2025-12-04T10:11:57.5512707Z FAILED [0.5970s] [100%] 2025-12-04T10:11:57.5512710Z 2025-12-04T10:11:57.5512799Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5513093Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5513169Z Traceback (most recent call last): 2025-12-04T10:11:57.5513480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5513548Z method(*args, **kwargs) 2025-12-04T10:11:57.5513846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5513926Z method(*args, **kwargs) 2025-12-04T10:11:57.5514220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5514294Z with policy(): 2025-12-04T10:11:57.5514591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5514657Z raise RuntimeError(msg) 2025-12-04T10:11:57.5515469Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5515473Z 2025-12-04T10:11:57.5515601Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5516133Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5516139Z 2025-12-04T10:11:57.5516297Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5516436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5516531Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5516883Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5517188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5517253Z graph_break [] 2025-12-04T10:11:57.5517382Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5518080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5518157Z if out == self.unknown_value: 2025-12-04T10:11:57.5518455Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5518527Z Traceback (most recent call last): 2025-12-04T10:11:57.5518825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5518892Z method(*args, **kwargs) 2025-12-04T10:11:57.5519182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5519380Z method(*args, **kwargs) 2025-12-04T10:11:57.5519678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5519740Z with policy(): 2025-12-04T10:11:57.5520075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5520267Z raise RuntimeError(msg) 2025-12-04T10:11:57.5521097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5521106Z 2025-12-04T10:11:57.5521233Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5521760Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5521764Z 2025-12-04T10:11:57.5521925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5522056Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5522153Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5522498Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5522624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5522689Z graph_break [] 2025-12-04T10:11:57.5522815Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5523513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5523586Z if out == self.unknown_value: 2025-12-04T10:11:57.5523713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5523816Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5523945Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5524286Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5524351Z graph_break [] 2025-12-04T10:11:57.5524435Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5524741Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5524816Z Traceback (most recent call last): 2025-12-04T10:11:57.5525117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5525188Z method(*args, **kwargs) 2025-12-04T10:11:57.5525482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5525553Z method(*args, **kwargs) 2025-12-04T10:11:57.5525842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5525899Z with policy(): 2025-12-04T10:11:57.5526200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5526266Z raise RuntimeError(msg) 2025-12-04T10:11:57.5527166Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5527250Z 2025-12-04T10:11:57.5527377Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5527906Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5527910Z 2025-12-04T10:11:57.5528069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5528196Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5528291Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5528635Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5528758Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5528822Z graph_break [] 2025-12-04T10:11:57.5528947Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5529630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5529700Z if out == self.unknown_value: 2025-12-04T10:11:57.5529820Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5529915Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5530036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5530377Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5530438Z graph_break [] 2025-12-04T10:11:57.5530561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5530654Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5530774Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5531112Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5531174Z graph_break [] 2025-12-04T10:11:57.5531673Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml - 2025-12-04T10:11:57.5531776Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5533085Z FAILED [0.5970s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5533092Z 2025-12-04T10:11:57.5533221Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5533745Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5533749Z 2025-12-04T10:11:57.5533974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5534083Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5534196Z ================== 1 failed, 57 deselected, 2 rerun in 12.40s ================== 2025-12-04T10:11:57.5534322Z Got exit code 1 2025-12-04T10:11:57.5534389Z Retrying single test... 2025-12-04T10:11:57.5534650Z W1204 09:37:22.590000 41857 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5535037Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml 2025-12-04T10:11:57.5535131Z ============================= test session starts ============================== 2025-12-04T10:11:57.5535337Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5535407Z cachedir: .pytest_cache 2025-12-04T10:11:57.5535715Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5535794Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5535859Z configfile: pytest.ini 2025-12-04T10:11:57.5536173Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5536304Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5536890Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5536963Z Running 1 items in this shard 2025-12-04T10:11:57.5536972Z 2025-12-04T10:11:57.5537716Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:23.787505701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5537720Z 2025-12-04T10:11:57.5538016Z [W1204 09:37:32.831611167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5538025Z 2025-12-04T10:11:57.5538313Z [W1204 09:37:32.831855751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5538316Z 2025-12-04T10:11:57.5538605Z [W1204 09:37:32.837421867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5538608Z 2025-12-04T10:11:57.5538902Z [W1204 09:37:32.837958926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5538905Z 2025-12-04T10:11:57.5539194Z [W1204 09:37:32.838126429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5539197Z 2025-12-04T10:11:57.5539485Z [W1204 09:37:32.843467702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5539491Z 2025-12-04T10:11:57.5539776Z [W1204 09:37:32.843972360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5539780Z 2025-12-04T10:11:57.5540073Z [W1204 09:37:32.844130633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5540076Z 2025-12-04T10:11:57.5540156Z ('RERUN', {'yellow': True}) [11.0508s] [100%] 2025-12-04T10:11:57.5540956Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:34.201330827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5540965Z 2025-12-04T10:11:57.5541258Z [W1204 09:37:34.201855836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5541325Z 2025-12-04T10:11:57.5541613Z [W1204 09:37:34.201995008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5541616Z 2025-12-04T10:11:57.5541907Z [W1204 09:37:34.204899229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5541910Z 2025-12-04T10:11:57.5542198Z [W1204 09:37:34.205450608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5542201Z 2025-12-04T10:11:57.5542495Z [W1204 09:37:34.205587301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5542498Z 2025-12-04T10:11:57.5542785Z [W1204 09:37:34.210070079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5542791Z 2025-12-04T10:11:57.5543083Z [W1204 09:37:34.210542007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5543086Z 2025-12-04T10:11:57.5543375Z [W1204 09:37:34.210681179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5543378Z 2025-12-04T10:11:57.5543461Z ('RERUN', {'yellow': True}) [0.6000s] [100%] 2025-12-04T10:11:57.5544189Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:34.797474906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5544193Z 2025-12-04T10:11:57.5544480Z [W1204 09:37:34.797998665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5544490Z 2025-12-04T10:11:57.5544776Z [W1204 09:37:34.798150498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5544779Z 2025-12-04T10:11:57.5545065Z [W1204 09:37:34.801073549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5545068Z 2025-12-04T10:11:57.5545364Z [W1204 09:37:34.801618858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5545367Z 2025-12-04T10:11:57.5545660Z [W1204 09:37:34.801757080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5545663Z 2025-12-04T10:11:57.5545952Z [W1204 09:37:34.806222118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5545958Z 2025-12-04T10:11:57.5546243Z [W1204 09:37:34.806675396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5546246Z 2025-12-04T10:11:57.5546538Z [W1204 09:37:34.806813378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5546541Z 2025-12-04T10:11:57.5546602Z FAILED [0.5964s] [100%] 2025-12-04T10:11:57.5546605Z 2025-12-04T10:11:57.5546684Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5547080Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5547158Z Traceback (most recent call last): 2025-12-04T10:11:57.5547471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5547601Z method(*args, **kwargs) 2025-12-04T10:11:57.5547893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5547962Z method(*args, **kwargs) 2025-12-04T10:11:57.5548256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5548318Z with policy(): 2025-12-04T10:11:57.5548615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5548680Z raise RuntimeError(msg) 2025-12-04T10:11:57.5549503Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5549509Z 2025-12-04T10:11:57.5549636Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5550164Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5550167Z 2025-12-04T10:11:57.5550323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5550452Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5550551Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5550904Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5551036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5551098Z graph_break [] 2025-12-04T10:11:57.5551221Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5551917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5551989Z if out == self.unknown_value: 2025-12-04T10:11:57.5552290Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5552363Z Traceback (most recent call last): 2025-12-04T10:11:57.5552660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5552727Z method(*args, **kwargs) 2025-12-04T10:11:57.5553018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5553083Z method(*args, **kwargs) 2025-12-04T10:11:57.5553376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5553434Z with policy(): 2025-12-04T10:11:57.5553730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5553794Z raise RuntimeError(msg) 2025-12-04T10:11:57.5554700Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5554705Z 2025-12-04T10:11:57.5554837Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5555427Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5555431Z 2025-12-04T10:11:57.5555591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5555716Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5555809Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5556158Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5556283Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5556347Z graph_break [] 2025-12-04T10:11:57.5556472Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5557160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5557232Z if out == self.unknown_value: 2025-12-04T10:11:57.5557354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5557446Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5557567Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5557909Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5557974Z graph_break [] 2025-12-04T10:11:57.5558057Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5558349Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5558427Z Traceback (most recent call last): 2025-12-04T10:11:57.5558720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5558787Z method(*args, **kwargs) 2025-12-04T10:11:57.5559080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5559142Z method(*args, **kwargs) 2025-12-04T10:11:57.5559440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5559501Z with policy(): 2025-12-04T10:11:57.5559796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5559860Z raise RuntimeError(msg) 2025-12-04T10:11:57.5560742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5560746Z 2025-12-04T10:11:57.5560880Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5561401Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5561405Z 2025-12-04T10:11:57.5561721Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5561848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5561941Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5562365Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5562490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5562553Z graph_break [] 2025-12-04T10:11:57.5562678Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5563366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5563444Z if out == self.unknown_value: 2025-12-04T10:11:57.5563566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5563660Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5563784Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5564125Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5564186Z graph_break [] 2025-12-04T10:11:57.5564310Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5564397Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5564522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5564861Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5564925Z graph_break [] 2025-12-04T10:11:57.5565409Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml - 2025-12-04T10:11:57.5565513Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5566816Z FAILED [0.5964s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5566821Z 2025-12-04T10:11:57.5566949Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5567476Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5567482Z 2025-12-04T10:11:57.5567637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5567744Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5567858Z ================== 1 failed, 57 deselected, 2 rerun in 12.27s ================== 2025-12-04T10:11:57.5567918Z Got exit code 1 2025-12-04T10:11:57.5568403Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5568717Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5568985Z W1204 09:37:41.436000 42051 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5569370Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml 2025-12-04T10:11:57.5569529Z ============================= test session starts ============================== 2025-12-04T10:11:57.5569739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5569808Z cachedir: .pytest_cache 2025-12-04T10:11:57.5570117Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5570193Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5570257Z configfile: pytest.ini 2025-12-04T10:11:57.5570584Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5570714Z collecting ... collected 58 items / 14 deselected / 44 selected 2025-12-04T10:11:57.5570800Z stepcurrent: skipping 14 already run items. 2025-12-04T10:11:57.5570876Z Running 44 items in this shard 2025-12-04T10:11:57.5570883Z 2025-12-04T10:11:57.5571378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8728s] [ 2%] 2025-12-04T10:11:57.5571871Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4552s] [ 2%] 2025-12-04T10:11:57.5572309Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4565s] [ 2%] 2025-12-04T10:11:57.5572316Z 2025-12-04T10:11:57.5572399Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5572693Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5572768Z Traceback (most recent call last): 2025-12-04T10:11:57.5573079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5573143Z method(*args, **kwargs) 2025-12-04T10:11:57.5573436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5573503Z method(*args, **kwargs) 2025-12-04T10:11:57.5573791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5573855Z with policy(): 2025-12-04T10:11:57.5574149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5574214Z raise RuntimeError(msg) 2025-12-04T10:11:57.5575018Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5575024Z 2025-12-04T10:11:57.5575147Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5575672Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5575676Z 2025-12-04T10:11:57.5575831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5576048Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5576145Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5576498Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5576692Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5576750Z graph_break [] 2025-12-04T10:11:57.5577038Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5577113Z Traceback (most recent call last): 2025-12-04T10:11:57.5577409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5577478Z method(*args, **kwargs) 2025-12-04T10:11:57.5577767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5577828Z method(*args, **kwargs) 2025-12-04T10:11:57.5578132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5578197Z with policy(): 2025-12-04T10:11:57.5578491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5578565Z raise RuntimeError(msg) 2025-12-04T10:11:57.5579376Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5579381Z 2025-12-04T10:11:57.5579510Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5580024Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5580031Z 2025-12-04T10:11:57.5580189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5580313Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5580405Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5580753Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5580879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5580939Z graph_break [] 2025-12-04T10:11:57.5581071Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5581159Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5581285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5581621Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5581682Z graph_break [] 2025-12-04T10:11:57.5581768Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5582054Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5582130Z Traceback (most recent call last): 2025-12-04T10:11:57.5582422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5582484Z method(*args, **kwargs) 2025-12-04T10:11:57.5582846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5582909Z method(*args, **kwargs) 2025-12-04T10:11:57.5583197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5583322Z with policy(): 2025-12-04T10:11:57.5583613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5583681Z raise RuntimeError(msg) 2025-12-04T10:11:57.5584493Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5584496Z 2025-12-04T10:11:57.5584621Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5585139Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5585144Z 2025-12-04T10:11:57.5585297Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5585424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5585513Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5585853Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5585984Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5586041Z graph_break [] 2025-12-04T10:11:57.5586169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5586257Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5586375Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5586715Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5586774Z graph_break [] 2025-12-04T10:11:57.5586903Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5586996Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5587118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5587472Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5587531Z graph_break [] 2025-12-04T10:11:57.5588022Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml - 2025-12-04T10:11:57.5588128Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5589414Z FAILED [0.4565s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5589418Z 2025-12-04T10:11:57.5589545Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5590132Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5590136Z 2025-12-04T10:11:57.5590293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5590461Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5590575Z ================== 1 failed, 14 deselected, 2 rerun in 2.81s =================== 2025-12-04T10:11:57.5590650Z Got exit code 1 2025-12-04T10:11:57.5590715Z Retrying single test... 2025-12-04T10:11:57.5590980Z W1204 09:37:51.099000 42239 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5591367Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml 2025-12-04T10:11:57.5591464Z ============================= test session starts ============================== 2025-12-04T10:11:57.5591678Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5591742Z cachedir: .pytest_cache 2025-12-04T10:11:57.5592045Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5592126Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5592190Z configfile: pytest.ini 2025-12-04T10:11:57.5592513Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5592638Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5593208Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5593285Z Running 1 items in this shard 2025-12-04T10:11:57.5593289Z 2025-12-04T10:11:57.5594016Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:37:52.148646446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5594023Z 2025-12-04T10:11:57.5594325Z [W1204 09:38:01.210659416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5594329Z 2025-12-04T10:11:57.5594616Z [W1204 09:38:01.210921841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5594620Z 2025-12-04T10:11:57.5594916Z [W1204 09:38:01.216778892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5594919Z 2025-12-04T10:11:57.5595206Z [W1204 09:38:01.217344071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5595209Z 2025-12-04T10:11:57.5595499Z [W1204 09:38:01.217509394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5595505Z 2025-12-04T10:11:57.5595791Z [W1204 09:38:01.222933818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5595794Z 2025-12-04T10:11:57.5596080Z [W1204 09:38:01.223483867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5596087Z 2025-12-04T10:11:57.5596377Z [W1204 09:38:01.223655690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5596381Z 2025-12-04T10:11:57.5596533Z ('RERUN', {'yellow': True}) [10.9176s] [100%] 2025-12-04T10:11:57.5597264Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:02.396496690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5597356Z 2025-12-04T10:11:57.5597648Z [W1204 09:38:02.397023299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5597651Z 2025-12-04T10:11:57.5597942Z [W1204 09:38:02.397168402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5597945Z 2025-12-04T10:11:57.5598232Z [W1204 09:38:02.400024251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5598235Z 2025-12-04T10:11:57.5598530Z [W1204 09:38:02.400583730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5598534Z 2025-12-04T10:11:57.5598823Z [W1204 09:38:02.400723193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5598829Z 2025-12-04T10:11:57.5599121Z [W1204 09:38:02.405121058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5599125Z 2025-12-04T10:11:57.5599415Z [W1204 09:38:02.405577156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5599418Z 2025-12-04T10:11:57.5599703Z [W1204 09:38:02.405716119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5599711Z 2025-12-04T10:11:57.5599791Z ('RERUN', {'yellow': True}) [0.4168s] [100%] 2025-12-04T10:11:57.5600550Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:02.810225669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5600556Z 2025-12-04T10:11:57.5600851Z [W1204 09:38:02.810750167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5600854Z 2025-12-04T10:11:57.5601141Z [W1204 09:38:02.810890840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5601144Z 2025-12-04T10:11:57.5601432Z [W1204 09:38:02.813718129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5601436Z 2025-12-04T10:11:57.5601725Z [W1204 09:38:02.814261628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5601727Z 2025-12-04T10:11:57.5602016Z [W1204 09:38:02.814402550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5602022Z 2025-12-04T10:11:57.5602306Z [W1204 09:38:02.818760665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5602309Z 2025-12-04T10:11:57.5602595Z [W1204 09:38:02.819206413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5602602Z 2025-12-04T10:11:57.5602903Z [W1204 09:38:02.819343336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5602906Z 2025-12-04T10:11:57.5602968Z FAILED [0.4117s] [100%] 2025-12-04T10:11:57.5602971Z 2025-12-04T10:11:57.5603129Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5603426Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5603606Z Traceback (most recent call last): 2025-12-04T10:11:57.5603909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5603973Z method(*args, **kwargs) 2025-12-04T10:11:57.5604269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5604332Z method(*args, **kwargs) 2025-12-04T10:11:57.5604618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5604680Z with policy(): 2025-12-04T10:11:57.5604977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5605047Z raise RuntimeError(msg) 2025-12-04T10:11:57.5605844Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5605850Z 2025-12-04T10:11:57.5605977Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5606498Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5606503Z 2025-12-04T10:11:57.5606661Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5606798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5606895Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5607240Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5607374Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5607432Z graph_break [] 2025-12-04T10:11:57.5607560Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5608252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5608321Z if out == self.unknown_value: 2025-12-04T10:11:57.5608618Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5608692Z Traceback (most recent call last): 2025-12-04T10:11:57.5608993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5609062Z method(*args, **kwargs) 2025-12-04T10:11:57.5609351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5609421Z method(*args, **kwargs) 2025-12-04T10:11:57.5609715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5609778Z with policy(): 2025-12-04T10:11:57.5610074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5610139Z raise RuntimeError(msg) 2025-12-04T10:11:57.5611028Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5611097Z 2025-12-04T10:11:57.5611223Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5611746Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5611750Z 2025-12-04T10:11:57.5611903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5612028Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5612139Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5612486Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5612614Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5612675Z graph_break [] 2025-12-04T10:11:57.5612798Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5613494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5613565Z if out == self.unknown_value: 2025-12-04T10:11:57.5613688Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5613781Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5613903Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5614251Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5614314Z graph_break [] 2025-12-04T10:11:57.5614398Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5614690Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5614764Z Traceback (most recent call last): 2025-12-04T10:11:57.5615063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5615126Z method(*args, **kwargs) 2025-12-04T10:11:57.5615415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5615483Z method(*args, **kwargs) 2025-12-04T10:11:57.5615776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5615844Z with policy(): 2025-12-04T10:11:57.5616151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5616221Z raise RuntimeError(msg) 2025-12-04T10:11:57.5617229Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5617234Z 2025-12-04T10:11:57.5617367Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5618002Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5618007Z 2025-12-04T10:11:57.5618170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5618389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5618493Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5618844Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5618979Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5619047Z graph_break [] 2025-12-04T10:11:57.5619175Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5619870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5619940Z if out == self.unknown_value: 2025-12-04T10:11:57.5620060Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5620159Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5620282Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5620628Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5620685Z graph_break [] 2025-12-04T10:11:57.5620811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5620904Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5621027Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5621369Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5621430Z graph_break [] 2025-12-04T10:11:57.5621921Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml - 2025-12-04T10:11:57.5622023Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5623318Z FAILED [0.4117s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5623322Z 2025-12-04T10:11:57.5623452Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5623971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5623977Z 2025-12-04T10:11:57.5624139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5624242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5624355Z ================== 1 failed, 57 deselected, 2 rerun in 11.77s ================== 2025-12-04T10:11:57.5624418Z Got exit code 1 2025-12-04T10:11:57.5624484Z Retrying single test... 2025-12-04T10:11:57.5624820Z W1204 09:38:09.509000 42432 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5625207Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml 2025-12-04T10:11:57.5625301Z ============================= test session starts ============================== 2025-12-04T10:11:57.5625597Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5625663Z cachedir: .pytest_cache 2025-12-04T10:11:57.5625966Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5626047Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5626111Z configfile: pytest.ini 2025-12-04T10:11:57.5626427Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5626559Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5627123Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5627202Z Running 1 items in this shard 2025-12-04T10:11:57.5627206Z 2025-12-04T10:11:57.5627935Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:10.562451910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5627939Z 2025-12-04T10:11:57.5628251Z [W1204 09:38:19.833132329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5628255Z 2025-12-04T10:11:57.5628557Z [W1204 09:38:19.833414604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5628561Z 2025-12-04T10:11:57.5628854Z [W1204 09:38:19.839285875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5631890Z 2025-12-04T10:11:57.5632260Z [W1204 09:38:19.839880015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5632265Z 2025-12-04T10:11:57.5632572Z [W1204 09:38:19.840092229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5632576Z 2025-12-04T10:11:57.5632877Z [W1204 09:38:19.845637034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5632881Z 2025-12-04T10:11:57.5633176Z [W1204 09:38:19.846195734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5633180Z 2025-12-04T10:11:57.5633471Z [W1204 09:38:19.846377487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5633476Z 2025-12-04T10:11:57.5633584Z ('RERUN', {'yellow': True}) [11.1268s] [100%] 2025-12-04T10:11:57.5634346Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:21.013635192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5634350Z 2025-12-04T10:11:57.5634650Z [W1204 09:38:21.014157051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5634654Z 2025-12-04T10:11:57.5635045Z [W1204 09:38:21.014301353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5635050Z 2025-12-04T10:11:57.5635346Z [W1204 09:38:21.017179133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5635384Z 2025-12-04T10:11:57.5635674Z [W1204 09:38:21.017725442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5635677Z 2025-12-04T10:11:57.5635968Z [W1204 09:38:21.017862985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5635971Z 2025-12-04T10:11:57.5636256Z [W1204 09:38:21.022275871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5636261Z 2025-12-04T10:11:57.5636553Z [W1204 09:38:21.022728079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5636556Z 2025-12-04T10:11:57.5636841Z [W1204 09:38:21.022865951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5636845Z 2025-12-04T10:11:57.5636926Z ('RERUN', {'yellow': True}) [0.4151s] [100%] 2025-12-04T10:11:57.5637666Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:21.427366203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5637670Z 2025-12-04T10:11:57.5637959Z [W1204 09:38:21.427878992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5637962Z 2025-12-04T10:11:57.5638252Z [W1204 09:38:21.428022865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5638256Z 2025-12-04T10:11:57.5638540Z [W1204 09:38:21.430906015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5638545Z 2025-12-04T10:11:57.5638834Z [W1204 09:38:21.431450284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5638898Z 2025-12-04T10:11:57.5639184Z [W1204 09:38:21.431592366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5639188Z 2025-12-04T10:11:57.5639478Z [W1204 09:38:21.435995622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5639481Z 2025-12-04T10:11:57.5639770Z [W1204 09:38:21.436456230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5639774Z 2025-12-04T10:11:57.5640136Z [W1204 09:38:21.436600763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5640144Z 2025-12-04T10:11:57.5640212Z FAILED [0.4117s] [100%] 2025-12-04T10:11:57.5640217Z 2025-12-04T10:11:57.5640306Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5640614Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5640693Z Traceback (most recent call last): 2025-12-04T10:11:57.5641007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5641080Z method(*args, **kwargs) 2025-12-04T10:11:57.5641370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5641524Z method(*args, **kwargs) 2025-12-04T10:11:57.5641826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5641888Z with policy(): 2025-12-04T10:11:57.5642224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5642293Z raise RuntimeError(msg) 2025-12-04T10:11:57.5643107Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5643112Z 2025-12-04T10:11:57.5643247Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5643776Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5643785Z 2025-12-04T10:11:57.5643948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5644085Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5644189Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5644541Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5644672Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5644738Z graph_break [] 2025-12-04T10:11:57.5644865Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5645576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5645651Z if out == self.unknown_value: 2025-12-04T10:11:57.5645953Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5646099Z Traceback (most recent call last): 2025-12-04T10:11:57.5646401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5646469Z method(*args, **kwargs) 2025-12-04T10:11:57.5646756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5646819Z method(*args, **kwargs) 2025-12-04T10:11:57.5647111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5647171Z with policy(): 2025-12-04T10:11:57.5647466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5647536Z raise RuntimeError(msg) 2025-12-04T10:11:57.5648352Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5648357Z 2025-12-04T10:11:57.5648493Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5649028Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5649102Z 2025-12-04T10:11:57.5649269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5649397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5649491Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5649885Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5650018Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5650078Z graph_break [] 2025-12-04T10:11:57.5650219Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5650927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5651002Z if out == self.unknown_value: 2025-12-04T10:11:57.5651129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5651222Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5651350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5651693Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5651756Z graph_break [] 2025-12-04T10:11:57.5651839Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5652133Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5652211Z Traceback (most recent call last): 2025-12-04T10:11:57.5652516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5652590Z method(*args, **kwargs) 2025-12-04T10:11:57.5652885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5652995Z method(*args, **kwargs) 2025-12-04T10:11:57.5653288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5653349Z with policy(): 2025-12-04T10:11:57.5653640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5653710Z raise RuntimeError(msg) 2025-12-04T10:11:57.5654532Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5654536Z 2025-12-04T10:11:57.5654667Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5655190Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5655197Z 2025-12-04T10:11:57.5655359Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5655482Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5655572Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5655920Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5656132Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5656195Z graph_break [] 2025-12-04T10:11:57.5656321Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5657012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5657124Z if out == self.unknown_value: 2025-12-04T10:11:57.5657246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5657335Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5657462Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5657802Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5657865Z graph_break [] 2025-12-04T10:11:57.5657997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5658087Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5658215Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5658561Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5658618Z graph_break [] 2025-12-04T10:11:57.5659116Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml - 2025-12-04T10:11:57.5659214Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5660514Z FAILED [0.4117s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5660561Z 2025-12-04T10:11:57.5660687Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5661215Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5661221Z 2025-12-04T10:11:57.5661501Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5661684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5661813Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ================== 2025-12-04T10:11:57.5661873Z Got exit code 1 2025-12-04T10:11:57.5662352Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5662598Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5662869Z W1204 09:38:28.076000 42625 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5663267Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml 2025-12-04T10:11:57.5663365Z ============================= test session starts ============================== 2025-12-04T10:11:57.5663665Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5663737Z cachedir: .pytest_cache 2025-12-04T10:11:57.5664046Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5664165Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5664233Z configfile: pytest.ini 2025-12-04T10:11:57.5664550Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5664684Z collecting ... collected 58 items / 15 deselected / 43 selected 2025-12-04T10:11:57.5664773Z stepcurrent: skipping 15 already run items. 2025-12-04T10:11:57.5664847Z Running 43 items in this shard 2025-12-04T10:11:57.5664850Z 2025-12-04T10:11:57.5665348Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9250s] [ 2%] 2025-12-04T10:11:57.5665832Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5379s] [ 2%] 2025-12-04T10:11:57.5666280Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5288s] [ 2%] 2025-12-04T10:11:57.5666287Z 2025-12-04T10:11:57.5666369Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5666669Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5666744Z Traceback (most recent call last): 2025-12-04T10:11:57.5667055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5667124Z method(*args, **kwargs) 2025-12-04T10:11:57.5667414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5667483Z method(*args, **kwargs) 2025-12-04T10:11:57.5667766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5667876Z with policy(): 2025-12-04T10:11:57.5668179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5668243Z raise RuntimeError(msg) 2025-12-04T10:11:57.5669047Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5669054Z 2025-12-04T10:11:57.5669191Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5669710Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5669721Z 2025-12-04T10:11:57.5669877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5670005Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5670102Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5670651Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5670855Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5670916Z graph_break [] 2025-12-04T10:11:57.5671206Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5671288Z Traceback (most recent call last): 2025-12-04T10:11:57.5671697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5671763Z method(*args, **kwargs) 2025-12-04T10:11:57.5672053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5672115Z method(*args, **kwargs) 2025-12-04T10:11:57.5672403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5672462Z with policy(): 2025-12-04T10:11:57.5672753Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5672822Z raise RuntimeError(msg) 2025-12-04T10:11:57.5673631Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5673638Z 2025-12-04T10:11:57.5673768Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5674286Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5674290Z 2025-12-04T10:11:57.5674446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5674590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5674682Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5675227Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5675395Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5675458Z graph_break [] 2025-12-04T10:11:57.5675591Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5675680Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5675798Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5676343Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5676402Z graph_break [] 2025-12-04T10:11:57.5676489Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5676776Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5676852Z Traceback (most recent call last): 2025-12-04T10:11:57.5677148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5677211Z method(*args, **kwargs) 2025-12-04T10:11:57.5677502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5677564Z method(*args, **kwargs) 2025-12-04T10:11:57.5677850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5677982Z with policy(): 2025-12-04T10:11:57.5678272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5678338Z raise RuntimeError(msg) 2025-12-04T10:11:57.5679197Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5679203Z 2025-12-04T10:11:57.5679326Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5679846Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5679850Z 2025-12-04T10:11:57.5680111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5680245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5680335Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5680878Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5681009Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5681069Z graph_break [] 2025-12-04T10:11:57.5681198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5681284Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5681401Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5681943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5682003Z graph_break [] 2025-12-04T10:11:57.5682199Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5682286Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5682407Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5682946Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5683005Z graph_break [] 2025-12-04T10:11:57.5683499Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml - 2025-12-04T10:11:57.5683603Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5684891Z FAILED [0.5288s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5684908Z 2025-12-04T10:11:57.5685037Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5685621Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5685626Z 2025-12-04T10:11:57.5685789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5685891Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5686047Z ================== 1 failed, 15 deselected, 2 rerun in 3.02s =================== 2025-12-04T10:11:57.5686107Z Got exit code 1 2025-12-04T10:11:57.5686172Z Retrying single test... 2025-12-04T10:11:57.5686440Z W1204 09:38:37.784000 42814 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5686825Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml 2025-12-04T10:11:57.5686918Z ============================= test session starts ============================== 2025-12-04T10:11:57.5687132Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5687198Z cachedir: .pytest_cache 2025-12-04T10:11:57.5687511Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5687589Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5687655Z configfile: pytest.ini 2025-12-04T10:11:57.5687987Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5688118Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5688685Z stepcurrent: skipping 15 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5688761Z Running 1 items in this shard 2025-12-04T10:11:57.5688765Z 2025-12-04T10:11:57.5689497Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:39.369785747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5689542Z 2025-12-04T10:11:57.5689844Z [W1204 09:38:48.626508113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5689848Z 2025-12-04T10:11:57.5690135Z [W1204 09:38:48.626764167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5690138Z 2025-12-04T10:11:57.5690429Z [W1204 09:38:48.633079837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5690433Z 2025-12-04T10:11:57.5690720Z [W1204 09:38:48.633686177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5690723Z 2025-12-04T10:11:57.5691015Z [W1204 09:38:48.633875531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5691020Z 2025-12-04T10:11:57.5691306Z [W1204 09:38:48.639243253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5691309Z 2025-12-04T10:11:57.5691599Z [W1204 09:38:48.639771942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5691602Z 2025-12-04T10:11:57.5691888Z [W1204 09:38:48.639931785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5691891Z 2025-12-04T10:11:57.5691972Z ('RERUN', {'yellow': True}) [11.1971s] [100%] 2025-12-04T10:11:57.5692762Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:49.441441919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5692799Z 2025-12-04T10:11:57.5693085Z [W1204 09:38:49.441948728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5693089Z 2025-12-04T10:11:57.5693378Z [W1204 09:38:49.442087940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5693382Z 2025-12-04T10:11:57.5693666Z [W1204 09:38:49.444944450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5693670Z 2025-12-04T10:11:57.5693963Z [W1204 09:38:49.445391577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5693966Z 2025-12-04T10:11:57.5694251Z [W1204 09:38:49.445530340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5694255Z 2025-12-04T10:11:57.5694549Z [W1204 09:38:49.449965366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5694553Z 2025-12-04T10:11:57.5694845Z [W1204 09:38:49.450469055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5694848Z 2025-12-04T10:11:57.5695134Z [W1204 09:38:49.450609747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5695140Z 2025-12-04T10:11:57.5695219Z ('RERUN', {'yellow': True}) [0.5018s] [100%] 2025-12-04T10:11:57.5695948Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:50.940807752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5695954Z 2025-12-04T10:11:57.5696290Z [W1204 09:38:50.941312871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5696293Z 2025-12-04T10:11:57.5696582Z [W1204 09:38:50.941451783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5696585Z 2025-12-04T10:11:57.5696876Z [W1204 09:38:50.944309633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5696879Z 2025-12-04T10:11:57.5697164Z [W1204 09:38:50.944755281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5697167Z 2025-12-04T10:11:57.5697456Z [W1204 09:38:50.944891053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5697459Z 2025-12-04T10:11:57.5697744Z [W1204 09:38:50.949308240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5697749Z 2025-12-04T10:11:57.5698044Z [W1204 09:38:50.949758057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5698048Z 2025-12-04T10:11:57.5698333Z [W1204 09:38:50.949895410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5698336Z 2025-12-04T10:11:57.5698399Z FAILED [0.4980s] [100%] 2025-12-04T10:11:57.5698402Z 2025-12-04T10:11:57.5698557Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5698856Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5698935Z Traceback (most recent call last): 2025-12-04T10:11:57.5699271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5699338Z method(*args, **kwargs) 2025-12-04T10:11:57.5699635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5699698Z method(*args, **kwargs) 2025-12-04T10:11:57.5699986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5700049Z with policy(): 2025-12-04T10:11:57.5700355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5700427Z raise RuntimeError(msg) 2025-12-04T10:11:57.5701225Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5701234Z 2025-12-04T10:11:57.5701364Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5701887Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5701891Z 2025-12-04T10:11:57.5702048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5702183Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5702275Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5702822Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5702994Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5703052Z graph_break [] 2025-12-04T10:11:57.5703183Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5703871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5703944Z if out == self.unknown_value: 2025-12-04T10:11:57.5704237Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5704310Z Traceback (most recent call last): 2025-12-04T10:11:57.5704608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5704686Z method(*args, **kwargs) 2025-12-04T10:11:57.5704975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5705042Z method(*args, **kwargs) 2025-12-04T10:11:57.5705330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5705397Z with policy(): 2025-12-04T10:11:57.5705690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5705756Z raise RuntimeError(msg) 2025-12-04T10:11:57.5706659Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5706714Z 2025-12-04T10:11:57.5706841Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5707366Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5707370Z 2025-12-04T10:11:57.5707525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5707646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5707740Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5708289Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5708475Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5708573Z graph_break [] 2025-12-04T10:11:57.5708732Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5709483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5709586Z if out == self.unknown_value: 2025-12-04T10:11:57.5709743Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5710054Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5710223Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5710797Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5710979Z graph_break [] 2025-12-04T10:11:57.5711098Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5711514Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5711633Z Traceback (most recent call last): 2025-12-04T10:11:57.5711960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5712090Z method(*args, **kwargs) 2025-12-04T10:11:57.5712485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5712598Z method(*args, **kwargs) 2025-12-04T10:11:57.5712981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5713091Z with policy(): 2025-12-04T10:11:57.5713447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5713545Z raise RuntimeError(msg) 2025-12-04T10:11:57.5714392Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5714459Z 2025-12-04T10:11:57.5714679Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5715281Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5715320Z 2025-12-04T10:11:57.5715571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5715732Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5715904Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5716478Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5716619Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5716809Z graph_break [] 2025-12-04T10:11:57.5716963Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5717911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5718034Z if out == self.unknown_value: 2025-12-04T10:11:57.5718191Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5718403Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5718575Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5719276Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5719370Z graph_break [] 2025-12-04T10:11:57.5719523Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5719666Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5719901Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5720600Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5720747Z graph_break [] 2025-12-04T10:11:57.5721267Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml - 2025-12-04T10:11:57.5721432Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5722746Z FAILED [0.4980s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5722753Z 2025-12-04T10:11:57.5723040Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5723598Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5723602Z 2025-12-04T10:11:57.5723825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5724073Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5724226Z ================== 1 failed, 57 deselected, 2 rerun in 12.22s ================== 2025-12-04T10:11:57.5724402Z Got exit code 1 2025-12-04T10:11:57.5724514Z Retrying single test... 2025-12-04T10:11:57.5724863Z W1204 09:38:56.591000 43008 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5725320Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml 2025-12-04T10:11:57.5725444Z ============================= test session starts ============================== 2025-12-04T10:11:57.5725720Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5725864Z cachedir: .pytest_cache 2025-12-04T10:11:57.5726296Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5726458Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5726558Z configfile: pytest.ini 2025-12-04T10:11:57.5726952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5727102Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5727744Z stepcurrent: skipping 15 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5727897Z Running 1 items in this shard 2025-12-04T10:11:57.5727902Z 2025-12-04T10:11:57.5728664Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:58.202269031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5728669Z 2025-12-04T10:11:57.5729033Z [W1204 09:39:07.121005188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5729038Z 2025-12-04T10:11:57.5729423Z [W1204 09:39:07.121265812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5729428Z 2025-12-04T10:11:57.5729810Z [W1204 09:39:07.127646362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5729814Z 2025-12-04T10:11:57.5730144Z [W1204 09:39:07.128246022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5730148Z 2025-12-04T10:11:57.5730503Z [W1204 09:39:07.128443366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5730506Z 2025-12-04T10:11:57.5730821Z [W1204 09:39:07.133822619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5730824Z 2025-12-04T10:11:57.5731158Z [W1204 09:39:07.134364359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5731197Z 2025-12-04T10:11:57.5731500Z [W1204 09:39:07.134530842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5731503Z 2025-12-04T10:11:57.5731670Z ('RERUN', {'yellow': True}) [10.8808s] [100%] 2025-12-04T10:11:57.5732552Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:39:07.932230053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5732556Z 2025-12-04T10:11:57.5732894Z [W1204 09:39:07.932747933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5732931Z 2025-12-04T10:11:57.5733285Z [W1204 09:39:07.932891785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5733291Z 2025-12-04T10:11:57.5733682Z [W1204 09:39:07.935777025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5733686Z 2025-12-04T10:11:57.5734077Z [W1204 09:39:07.936223743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5734081Z 2025-12-04T10:11:57.5734426Z [W1204 09:39:07.936369385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5734430Z 2025-12-04T10:11:57.5734803Z [W1204 09:39:08.940917844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5734806Z 2025-12-04T10:11:57.5735125Z [W1204 09:39:08.941375732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5735130Z 2025-12-04T10:11:57.5735479Z [W1204 09:39:08.941513325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5735482Z 2025-12-04T10:11:57.5735578Z ('RERUN', {'yellow': True}) [0.4985s] [100%] 2025-12-04T10:11:57.5736384Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:39:08.430512485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5736435Z 2025-12-04T10:11:57.5736768Z [W1204 09:39:08.431034654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5736772Z 2025-12-04T10:11:57.5737090Z [W1204 09:39:08.431183296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5737145Z 2025-12-04T10:11:57.5737502Z [W1204 09:39:08.434134587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5737505Z 2025-12-04T10:11:57.5737823Z [W1204 09:39:08.434597965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5737827Z 2025-12-04T10:11:57.5738214Z [W1204 09:39:08.434737968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5738221Z 2025-12-04T10:11:57.5738572Z [W1204 09:39:08.439284336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5738576Z 2025-12-04T10:11:57.5738922Z [W1204 09:39:08.439752894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5738929Z 2025-12-04T10:11:57.5739248Z [W1204 09:39:08.439888916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5739252Z 2025-12-04T10:11:57.5739378Z FAILED [0.4964s] [100%] 2025-12-04T10:11:57.5739382Z 2025-12-04T10:11:57.5739482Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5739855Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5740102Z Traceback (most recent call last): 2025-12-04T10:11:57.5740447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5740617Z method(*args, **kwargs) 2025-12-04T10:11:57.5740971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5741105Z method(*args, **kwargs) 2025-12-04T10:11:57.5741535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5741628Z with policy(): 2025-12-04T10:11:57.5741953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5742086Z raise RuntimeError(msg) 2025-12-04T10:11:57.5742937Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5742942Z 2025-12-04T10:11:57.5743175Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5743757Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5743761Z 2025-12-04T10:11:57.5743994Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5744158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5744301Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5744901Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5745121Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5745257Z graph_break [] 2025-12-04T10:11:57.5745418Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5746221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5746358Z if out == self.unknown_value: 2025-12-04T10:11:57.5746666Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5746876Z Traceback (most recent call last): 2025-12-04T10:11:57.5747209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5747376Z method(*args, **kwargs) 2025-12-04T10:11:57.5747742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5747837Z method(*args, **kwargs) 2025-12-04T10:11:57.5748145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5748330Z with policy(): 2025-12-04T10:11:57.5748652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5748797Z raise RuntimeError(msg) 2025-12-04T10:11:57.5749720Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5749726Z 2025-12-04T10:11:57.5749907Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5750504Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5750544Z 2025-12-04T10:11:57.5750767Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5750959Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5751083Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5751691Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5751835Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5751970Z graph_break [] 2025-12-04T10:11:57.5752188Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5752907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5753043Z if out == self.unknown_value: 2025-12-04T10:11:57.5753194Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5753312Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5753648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5754219Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5754341Z graph_break [] 2025-12-04T10:11:57.5754456Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5754827Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5755015Z Traceback (most recent call last): 2025-12-04T10:11:57.5755355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5755451Z method(*args, **kwargs) 2025-12-04T10:11:57.5755806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5755899Z method(*args, **kwargs) 2025-12-04T10:11:57.5756266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5756405Z with policy(): 2025-12-04T10:11:57.5756742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5756877Z raise RuntimeError(msg) 2025-12-04T10:11:57.5757724Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5757728Z 2025-12-04T10:11:57.5757942Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5758548Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5758552Z 2025-12-04T10:11:57.5758843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5759000Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5759155Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5759777Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5760097Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5760263Z graph_break [] 2025-12-04T10:11:57.5760435Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5761164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5761317Z if out == self.unknown_value: 2025-12-04T10:11:57.5761469Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5761613Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5761816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5762413Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5762554Z graph_break [] 2025-12-04T10:11:57.5762709Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5762828Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5763002Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5763626Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5763846Z graph_break [] 2025-12-04T10:11:57.5764375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml - 2025-12-04T10:11:57.5764547Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5765875Z FAILED [0.4964s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5765882Z 2025-12-04T10:11:57.5766134Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5766705Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5766709Z 2025-12-04T10:11:57.5766912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5767081Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5767301Z ================== 1 failed, 57 deselected, 2 rerun in 11.90s ================== 2025-12-04T10:11:57.5767408Z Got exit code 1 2025-12-04T10:11:57.5768056Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5768372Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5768742Z W1204 09:39:15.093000 43202 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5769160Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml 2025-12-04T10:11:57.5769320Z ============================= test session starts ============================== 2025-12-04T10:11:57.5769544Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5769710Z cachedir: .pytest_cache 2025-12-04T10:11:57.5770101Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5770211Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5770307Z configfile: pytest.ini 2025-12-04T10:11:57.5770691Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5770856Z collecting ... collected 58 items / 16 deselected / 42 selected 2025-12-04T10:11:57.5771069Z stepcurrent: skipping 16 already run items. 2025-12-04T10:11:57.5771170Z Running 42 items in this shard 2025-12-04T10:11:57.5771175Z 2025-12-04T10:11:57.5771706Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8864s] [ 2%] 2025-12-04T10:11:57.5772264Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4925s] [ 2%] 2025-12-04T10:11:57.5772754Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4802s] [ 2%] 2025-12-04T10:11:57.5772798Z 2025-12-04T10:11:57.5772983Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5773319Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5773540Z Traceback (most recent call last): 2025-12-04T10:11:57.5773877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5773990Z method(*args, **kwargs) 2025-12-04T10:11:57.5774333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5774481Z method(*args, **kwargs) 2025-12-04T10:11:57.5774845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5774936Z with policy(): 2025-12-04T10:11:57.5775259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5775419Z raise RuntimeError(msg) 2025-12-04T10:11:57.5776238Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5776243Z 2025-12-04T10:11:57.5776495Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5777120Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5777124Z 2025-12-04T10:11:57.5777328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5777568Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5777694Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5778144Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5778324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5778431Z graph_break [] 2025-12-04T10:11:57.5778790Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5778897Z Traceback (most recent call last): 2025-12-04T10:11:57.5779249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5779393Z method(*args, **kwargs) 2025-12-04T10:11:57.5779796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5779955Z method(*args, **kwargs) 2025-12-04T10:11:57.5780273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5780363Z with policy(): 2025-12-04T10:11:57.5780705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5780876Z raise RuntimeError(msg) 2025-12-04T10:11:57.5781797Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5781802Z 2025-12-04T10:11:57.5781960Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5782596Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5782600Z 2025-12-04T10:11:57.5782786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5782933Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5783171Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5783549Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5783744Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5783832Z graph_break [] 2025-12-04T10:11:57.5783987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5784189Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5784354Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5784760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5784851Z graph_break [] 2025-12-04T10:11:57.5784966Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5785321Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5785626Z Traceback (most recent call last): 2025-12-04T10:11:57.5785996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5786128Z method(*args, **kwargs) 2025-12-04T10:11:57.5786493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5786659Z method(*args, **kwargs) 2025-12-04T10:11:57.5786969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5787117Z with policy(): 2025-12-04T10:11:57.5787496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5787595Z raise RuntimeError(msg) 2025-12-04T10:11:57.5788511Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5788515Z 2025-12-04T10:11:57.5788672Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5789211Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5789292Z 2025-12-04T10:11:57.5789496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5789655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5789825Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5790203Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5790357Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5790508Z graph_break [] 2025-12-04T10:11:57.5790678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5790890Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5791040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5791483Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5791601Z graph_break [] 2025-12-04T10:11:57.5791803Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5791967Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5792140Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5792511Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5792637Z graph_break [] 2025-12-04T10:11:57.5793154Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml - 2025-12-04T10:11:57.5793378Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5794809Z FAILED [0.4802s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5794814Z 2025-12-04T10:11:57.5795007Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5795559Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5795598Z 2025-12-04T10:11:57.5795784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5795987Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5796165Z ================== 1 failed, 16 deselected, 2 rerun in 2.88s =================== 2025-12-04T10:11:57.5796288Z Got exit code 1 2025-12-04T10:11:57.5796383Z Retrying single test... 2025-12-04T10:11:57.5796679Z W1204 09:39:24.745000 43390 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5797129Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml 2025-12-04T10:11:57.5797321Z ============================= test session starts ============================== 2025-12-04T10:11:57.5797581Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5797784Z cachedir: .pytest_cache 2025-12-04T10:11:57.5798120Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5798259Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5798345Z configfile: pytest.ini 2025-12-04T10:11:57.5798774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5798988Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5799592Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5799772Z Running 1 items in this shard 2025-12-04T10:11:57.5799776Z 2025-12-04T10:11:57.5800610Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:25.832083876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5800615Z 2025-12-04T10:11:57.5801020Z [W1204 09:39:34.861239927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5801025Z 2025-12-04T10:11:57.5801371Z [W1204 09:39:34.861491871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5801375Z 2025-12-04T10:11:57.5801697Z [W1204 09:39:34.867303240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5801737Z 2025-12-04T10:11:57.5802071Z [W1204 09:39:34.867885430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5802075Z 2025-12-04T10:11:57.5802396Z [W1204 09:39:34.868055873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5802400Z 2025-12-04T10:11:57.5802754Z [W1204 09:39:34.873513866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5802758Z 2025-12-04T10:11:57.5803193Z [W1204 09:39:34.874039615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5803197Z 2025-12-04T10:11:57.5803569Z [W1204 09:39:34.874194328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5803606Z 2025-12-04T10:11:57.5803718Z ('RERUN', {'yellow': True}) [10.9214s] [100%] 2025-12-04T10:11:57.5804531Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:36.083535800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5804535Z 2025-12-04T10:11:57.5804856Z [W1204 09:39:36.084066819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5804859Z 2025-12-04T10:11:57.5805247Z [W1204 09:39:36.084205092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5805251Z 2025-12-04T10:11:57.5805659Z [W1204 09:39:36.087088121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5805665Z 2025-12-04T10:11:57.5806033Z [W1204 09:39:36.087642560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5806037Z 2025-12-04T10:11:57.5806359Z [W1204 09:39:36.087781593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5806362Z 2025-12-04T10:11:57.5806680Z [W1204 09:39:36.092382871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5806717Z 2025-12-04T10:11:57.5807025Z [W1204 09:39:36.092846719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5807028Z 2025-12-04T10:11:57.5807390Z [W1204 09:39:36.092984652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5807394Z 2025-12-04T10:11:57.5807559Z ('RERUN', {'yellow': True}) [0.4527s] [100%] 2025-12-04T10:11:57.5808369Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:36.535540620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5808373Z 2025-12-04T10:11:57.5808728Z [W1204 09:39:36.536062009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5808732Z 2025-12-04T10:11:57.5809054Z [W1204 09:39:36.536201771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5809058Z 2025-12-04T10:11:57.5809445Z [W1204 09:39:36.539098391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5809449Z 2025-12-04T10:11:57.5809784Z [W1204 09:39:36.539645970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5809789Z 2025-12-04T10:11:57.5810161Z [W1204 09:39:36.539782842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5810165Z 2025-12-04T10:11:57.5810481Z [W1204 09:39:36.544248438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5810484Z 2025-12-04T10:11:57.5810872Z [W1204 09:39:36.544720936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5810911Z 2025-12-04T10:11:57.5811218Z [W1204 09:39:36.544858859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5811221Z 2025-12-04T10:11:57.5811438Z FAILED [0.4487s] [100%] 2025-12-04T10:11:57.5811444Z 2025-12-04T10:11:57.5811608Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5811940Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5812079Z Traceback (most recent call last): 2025-12-04T10:11:57.5812503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5812588Z method(*args, **kwargs) 2025-12-04T10:11:57.5813024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5813125Z method(*args, **kwargs) 2025-12-04T10:11:57.5813445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5813571Z with policy(): 2025-12-04T10:11:57.5813896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5814075Z raise RuntimeError(msg) 2025-12-04T10:11:57.5814931Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5814935Z 2025-12-04T10:11:57.5815129Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5815690Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5815694Z 2025-12-04T10:11:57.5815883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5816122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5816290Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5816716Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5816886Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5816977Z graph_break [] 2025-12-04T10:11:57.5817356Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5818086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5818301Z if out == self.unknown_value: 2025-12-04T10:11:57.5818636Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5818746Z Traceback (most recent call last): 2025-12-04T10:11:57.5819205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5819301Z method(*args, **kwargs) 2025-12-04T10:11:57.5819689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5819805Z method(*args, **kwargs) 2025-12-04T10:11:57.5820241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5820389Z with policy(): 2025-12-04T10:11:57.5820714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5820812Z raise RuntimeError(msg) 2025-12-04T10:11:57.5825762Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5825773Z 2025-12-04T10:11:57.5825952Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5826504Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5826516Z 2025-12-04T10:11:57.5826695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5826834Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5826937Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5827297Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5827428Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5827492Z graph_break [] 2025-12-04T10:11:57.5827623Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5828335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5828412Z if out == self.unknown_value: 2025-12-04T10:11:57.5828539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5828637Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5828872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5829219Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5829282Z graph_break [] 2025-12-04T10:11:57.5829366Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5829669Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5829750Z Traceback (most recent call last): 2025-12-04T10:11:57.5830068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5830142Z method(*args, **kwargs) 2025-12-04T10:11:57.5830436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5830503Z method(*args, **kwargs) 2025-12-04T10:11:57.5830825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5830887Z with policy(): 2025-12-04T10:11:57.5831192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5831274Z raise RuntimeError(msg) 2025-12-04T10:11:57.5832189Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5832194Z 2025-12-04T10:11:57.5832337Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5832864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5832904Z 2025-12-04T10:11:57.5833076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5833214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5833314Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5833676Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5833809Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5833872Z graph_break [] 2025-12-04T10:11:57.5833999Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5834707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5834793Z if out == self.unknown_value: 2025-12-04T10:11:57.5834918Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5835009Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5835141Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5835488Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5835551Z graph_break [] 2025-12-04T10:11:57.5835674Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5835764Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5835895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5836275Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5836350Z graph_break [] 2025-12-04T10:11:57.5836853Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml - 2025-12-04T10:11:57.5836959Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5838275Z FAILED [0.4487s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5838282Z 2025-12-04T10:11:57.5838411Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5838944Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5838948Z 2025-12-04T10:11:57.5839106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5839305Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5839424Z ================== 1 failed, 57 deselected, 2 rerun in 11.85s ================== 2025-12-04T10:11:57.5839482Z Got exit code 1 2025-12-04T10:11:57.5839552Z Retrying single test... 2025-12-04T10:11:57.5839820Z W1204 09:39:43.207000 43583 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5840318Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml 2025-12-04T10:11:57.5840419Z ============================= test session starts ============================== 2025-12-04T10:11:57.5840631Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5840701Z cachedir: .pytest_cache 2025-12-04T10:11:57.5841013Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5841093Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5841172Z configfile: pytest.ini 2025-12-04T10:11:57.5841495Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5841634Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5842211Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5842280Z Running 1 items in this shard 2025-12-04T10:11:57.5842285Z 2025-12-04T10:11:57.5843025Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:44.283836033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5843029Z 2025-12-04T10:11:57.5843333Z [W1204 09:39:53.321639950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5843336Z 2025-12-04T10:11:57.5843631Z [W1204 09:39:53.321875844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5843678Z 2025-12-04T10:11:57.5843967Z [W1204 09:39:53.327348429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5843970Z 2025-12-04T10:11:57.5844269Z [W1204 09:39:53.327857197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5844274Z 2025-12-04T10:11:57.5844568Z [W1204 09:39:53.328017870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5844575Z 2025-12-04T10:11:57.5844869Z [W1204 09:39:53.333251930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5844872Z 2025-12-04T10:11:57.5845156Z [W1204 09:39:53.333757788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5845163Z 2025-12-04T10:11:57.5845450Z [W1204 09:39:53.333912341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5845458Z 2025-12-04T10:11:57.5845538Z ('RERUN', {'yellow': True}) [10.9202s] [100%] 2025-12-04T10:11:57.5846264Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:54.546756122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5846334Z 2025-12-04T10:11:57.5846629Z [W1204 09:39:54.547304991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5846633Z 2025-12-04T10:11:57.5846919Z [W1204 09:39:54.547449693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5846956Z 2025-12-04T10:11:57.5847251Z [W1204 09:39:54.550500736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5847254Z 2025-12-04T10:11:57.5847543Z [W1204 09:39:54.551079386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5847547Z 2025-12-04T10:11:57.5847840Z [W1204 09:39:54.551217388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5847844Z 2025-12-04T10:11:57.5848132Z [W1204 09:39:54.555826056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5848135Z 2025-12-04T10:11:57.5848421Z [W1204 09:39:54.556293915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5848431Z 2025-12-04T10:11:57.5848715Z [W1204 09:39:54.556444867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5848718Z 2025-12-04T10:11:57.5848796Z ('RERUN', {'yellow': True}) [0.4542s] [100%] 2025-12-04T10:11:57.5849525Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:55.997789726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5849531Z 2025-12-04T10:11:57.5849820Z [W1204 09:39:55.998331565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5849824Z 2025-12-04T10:11:57.5850115Z [W1204 09:39:55.998477677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5850155Z 2025-12-04T10:11:57.5850442Z [W1204 09:39:55.001517150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5850445Z 2025-12-04T10:11:57.5850738Z [W1204 09:39:55.002087879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5850741Z 2025-12-04T10:11:57.5851026Z [W1204 09:39:55.002229192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5851030Z 2025-12-04T10:11:57.5851324Z [W1204 09:39:55.006832931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5851328Z 2025-12-04T10:11:57.5851615Z [W1204 09:39:55.007295008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5851621Z 2025-12-04T10:11:57.5851908Z [W1204 09:39:55.007433051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5851918Z 2025-12-04T10:11:57.5851980Z FAILED [0.4493s] [100%] 2025-12-04T10:11:57.5851983Z 2025-12-04T10:11:57.5852068Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5852374Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5852450Z Traceback (most recent call last): 2025-12-04T10:11:57.5853189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5853267Z method(*args, **kwargs) 2025-12-04T10:11:57.5853563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5853666Z method(*args, **kwargs) 2025-12-04T10:11:57.5853955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5854016Z with policy(): 2025-12-04T10:11:57.5854325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5854392Z raise RuntimeError(msg) 2025-12-04T10:11:57.5855211Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5855215Z 2025-12-04T10:11:57.5855344Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5855873Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5855883Z 2025-12-04T10:11:57.5856047Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5856176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5856279Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5856636Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5856767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5856832Z graph_break [] 2025-12-04T10:11:57.5856955Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5857658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5857769Z if out == self.unknown_value: 2025-12-04T10:11:57.5858063Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5858146Z Traceback (most recent call last): 2025-12-04T10:11:57.5858446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5858515Z method(*args, **kwargs) 2025-12-04T10:11:57.5858812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5858876Z method(*args, **kwargs) 2025-12-04T10:11:57.5859165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5859226Z with policy(): 2025-12-04T10:11:57.5859517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5859587Z raise RuntimeError(msg) 2025-12-04T10:11:57.5860408Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5860412Z 2025-12-04T10:11:57.5860610Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5861135Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5861173Z 2025-12-04T10:11:57.5861335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5861461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5861555Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5861906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5862034Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5862093Z graph_break [] 2025-12-04T10:11:57.5862229Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5862914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5862990Z if out == self.unknown_value: 2025-12-04T10:11:57.5863113Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5863203Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5863330Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5863673Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5863734Z graph_break [] 2025-12-04T10:11:57.5863819Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5864112Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.5864199Z Traceback (most recent call last): 2025-12-04T10:11:57.5864509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5864613Z method(*args, **kwargs) 2025-12-04T10:11:57.5864907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5864969Z method(*args, **kwargs) 2025-12-04T10:11:57.5865262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5865320Z with policy(): 2025-12-04T10:11:57.5865616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5865689Z raise RuntimeError(msg) 2025-12-04T10:11:57.5866515Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5866522Z 2025-12-04T10:11:57.5866657Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5867179Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5867183Z 2025-12-04T10:11:57.5867337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5867552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5867644Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5867997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5868166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5868224Z graph_break [] 2025-12-04T10:11:57.5868353Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5869043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5869116Z if out == self.unknown_value: 2025-12-04T10:11:57.5869241Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5869332Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5869461Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5869798Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5869866Z graph_break [] 2025-12-04T10:11:57.5869995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5870083Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5870208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5870546Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5870604Z graph_break [] 2025-12-04T10:11:57.5871106Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml - 2025-12-04T10:11:57.5871207Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5872515Z FAILED [0.4493s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5872559Z 2025-12-04T10:11:57.5872687Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5873219Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5873222Z 2025-12-04T10:11:57.5873379Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5873486Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5873611Z ================== 1 failed, 57 deselected, 2 rerun in 11.85s ================== 2025-12-04T10:11:57.5873669Z Got exit code 1 2025-12-04T10:11:57.5874150Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.5874394Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5874660Z W1204 09:40:01.651000 43776 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5875131Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml 2025-12-04T10:11:57.5875230Z ============================= test session starts ============================== 2025-12-04T10:11:57.5875483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5875549Z cachedir: .pytest_cache 2025-12-04T10:11:57.5875857Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5875937Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5876002Z configfile: pytest.ini 2025-12-04T10:11:57.5876316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5876449Z collecting ... collected 58 items / 17 deselected / 41 selected 2025-12-04T10:11:57.5876538Z stepcurrent: skipping 17 already run items. 2025-12-04T10:11:57.5876614Z Running 41 items in this shard 2025-12-04T10:11:57.5876618Z 2025-12-04T10:11:57.5877117Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8498s] [ 2%] 2025-12-04T10:11:57.5877606Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4611s] [ 2%] 2025-12-04T10:11:57.5878052Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4613s] [ 2%] 2025-12-04T10:11:57.5878056Z 2025-12-04T10:11:57.5878142Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5878445Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5878522Z Traceback (most recent call last): 2025-12-04T10:11:57.5878830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5878940Z method(*args, **kwargs) 2025-12-04T10:11:57.5879232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5879301Z method(*args, **kwargs) 2025-12-04T10:11:57.5879589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5879647Z with policy(): 2025-12-04T10:11:57.5880000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5880069Z raise RuntimeError(msg) 2025-12-04T10:11:57.5880876Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5880884Z 2025-12-04T10:11:57.5881011Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5881540Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5881549Z 2025-12-04T10:11:57.5881706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5881832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5881939Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5882369Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5882502Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5882605Z graph_break [] 2025-12-04T10:11:57.5882898Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5882979Z Traceback (most recent call last): 2025-12-04T10:11:57.5883278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5883343Z method(*args, **kwargs) 2025-12-04T10:11:57.5883638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5883700Z method(*args, **kwargs) 2025-12-04T10:11:57.5883994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5884057Z with policy(): 2025-12-04T10:11:57.5884346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5884418Z raise RuntimeError(msg) 2025-12-04T10:11:57.5885221Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5885225Z 2025-12-04T10:11:57.5885356Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5885881Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5885885Z 2025-12-04T10:11:57.5886053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5886186Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5886319Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5886670Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5886797Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5886855Z graph_break [] 2025-12-04T10:11:57.5886985Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5887074Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5887193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5887543Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5887601Z graph_break [] 2025-12-04T10:11:57.5887688Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5887981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5888055Z Traceback (most recent call last): 2025-12-04T10:11:57.5888355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5888418Z method(*args, **kwargs) 2025-12-04T10:11:57.5888709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5888776Z method(*args, **kwargs) 2025-12-04T10:11:57.5889135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5889200Z with policy(): 2025-12-04T10:11:57.5889489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5889590Z raise RuntimeError(msg) 2025-12-04T10:11:57.5890407Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5890411Z 2025-12-04T10:11:57.5890534Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5891058Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5891061Z 2025-12-04T10:11:57.5891213Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5891344Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5891449Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5891795Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5891923Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5891981Z graph_break [] 2025-12-04T10:11:57.5892103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5892195Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5892316Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5892659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5892719Z graph_break [] 2025-12-04T10:11:57.5892841Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5893003Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5893121Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5893457Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5893522Z graph_break [] 2025-12-04T10:11:57.5894010Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml - 2025-12-04T10:11:57.5894116Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5895388Z FAILED [0.4613s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5895396Z 2025-12-04T10:11:57.5895523Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5896036Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5896040Z 2025-12-04T10:11:57.5896264Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5896376Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5896490Z ================== 1 failed, 17 deselected, 2 rerun in 2.80s =================== 2025-12-04T10:11:57.5896587Z Got exit code 1 2025-12-04T10:11:57.5896651Z Retrying single test... 2025-12-04T10:11:57.5896913Z W1204 09:40:11.309000 43957 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5897305Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml 2025-12-04T10:11:57.5897401Z ============================= test session starts ============================== 2025-12-04T10:11:57.5897613Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5897680Z cachedir: .pytest_cache 2025-12-04T10:11:57.5897990Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5898075Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5898141Z configfile: pytest.ini 2025-12-04T10:11:57.5898458Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5898595Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5899163Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5899244Z Running 1 items in this shard 2025-12-04T10:11:57.5899248Z 2025-12-04T10:11:57.5899982Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:12.361453107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5899986Z 2025-12-04T10:11:57.5900292Z [W1204 09:40:21.532843020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5900410Z 2025-12-04T10:11:57.5900703Z [W1204 09:40:21.533137674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5900707Z 2025-12-04T10:11:57.5900999Z [W1204 09:40:21.538963785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5901007Z 2025-12-04T10:11:57.5901298Z [W1204 09:40:21.539536594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5901301Z 2025-12-04T10:11:57.5901592Z [W1204 09:40:21.539721196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5901595Z 2025-12-04T10:11:57.5901889Z [W1204 09:40:21.545317344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5901894Z 2025-12-04T10:11:57.5902183Z [W1204 09:40:21.545893903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5902187Z 2025-12-04T10:11:57.5902482Z [W1204 09:40:21.546063626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5902486Z 2025-12-04T10:11:57.5902568Z ('RERUN', {'yellow': True}) [11.0317s] [100%] 2025-12-04T10:11:57.5903362Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:22.719975453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5903366Z 2025-12-04T10:11:57.5903657Z [W1204 09:40:22.720576483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5903696Z 2025-12-04T10:11:57.5904002Z [W1204 09:40:22.720720386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5904005Z 2025-12-04T10:11:57.5904296Z [W1204 09:40:22.723754688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5904300Z 2025-12-04T10:11:57.5904587Z [W1204 09:40:22.724327598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5904595Z 2025-12-04T10:11:57.5904885Z [W1204 09:40:22.724466050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5904888Z 2025-12-04T10:11:57.5905176Z [W1204 09:40:22.729133051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5905183Z 2025-12-04T10:11:57.5905475Z [W1204 09:40:22.729606539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5905478Z 2025-12-04T10:11:57.5905766Z [W1204 09:40:22.729743221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5905769Z 2025-12-04T10:11:57.5905854Z ('RERUN', {'yellow': True}) [0.4189s] [100%] 2025-12-04T10:11:57.5906574Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:23.137224358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5906578Z 2025-12-04T10:11:57.5906875Z [W1204 09:40:23.137787498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5906923Z 2025-12-04T10:11:57.5907216Z [W1204 09:40:23.137930160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5907219Z 2025-12-04T10:11:57.5907505Z [W1204 09:40:23.141041024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5907514Z 2025-12-04T10:11:57.5907800Z [W1204 09:40:23.141605194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5907803Z 2025-12-04T10:11:57.5908094Z [W1204 09:40:23.141764537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5908097Z 2025-12-04T10:11:57.5908389Z [W1204 09:40:23.146417557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5908394Z 2025-12-04T10:11:57.5908682Z [W1204 09:40:23.146881765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5908685Z 2025-12-04T10:11:57.5908976Z [W1204 09:40:23.147018257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5908979Z 2025-12-04T10:11:57.5909039Z FAILED [0.4145s] [100%] 2025-12-04T10:11:57.5909043Z 2025-12-04T10:11:57.5909130Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5909484Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5909562Z Traceback (most recent call last): 2025-12-04T10:11:57.5909875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5909975Z method(*args, **kwargs) 2025-12-04T10:11:57.5910273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5910343Z method(*args, **kwargs) 2025-12-04T10:11:57.5910633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5910696Z with policy(): 2025-12-04T10:11:57.5910989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5911053Z raise RuntimeError(msg) 2025-12-04T10:11:57.5911860Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5911867Z 2025-12-04T10:11:57.5911998Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5912528Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5912532Z 2025-12-04T10:11:57.5912689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5912819Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5912912Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5913264Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5913405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5913467Z graph_break [] 2025-12-04T10:11:57.5913632Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5914329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5914401Z if out == self.unknown_value: 2025-12-04T10:11:57.5914694Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5914767Z Traceback (most recent call last): 2025-12-04T10:11:57.5915075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5915139Z method(*args, **kwargs) 2025-12-04T10:11:57.5915431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5915503Z method(*args, **kwargs) 2025-12-04T10:11:57.5915791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5915860Z with policy(): 2025-12-04T10:11:57.5916155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5916220Z raise RuntimeError(msg) 2025-12-04T10:11:57.5917329Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5917335Z 2025-12-04T10:11:57.5917478Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5918021Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5918094Z 2025-12-04T10:11:57.5918259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5918391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5918484Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5918833Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5918968Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5919027Z graph_break [] 2025-12-04T10:11:57.5919151Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5919859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5919970Z if out == self.unknown_value: 2025-12-04T10:11:57.5920097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5920189Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5920313Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5920664Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5920727Z graph_break [] 2025-12-04T10:11:57.5920814Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5921106Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5921237Z Traceback (most recent call last): 2025-12-04T10:11:57.5921549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5921612Z method(*args, **kwargs) 2025-12-04T10:11:57.5921903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5921970Z method(*args, **kwargs) 2025-12-04T10:11:57.5922269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5922347Z with policy(): 2025-12-04T10:11:57.5922644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5922709Z raise RuntimeError(msg) 2025-12-04T10:11:57.5923532Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5923540Z 2025-12-04T10:11:57.5923665Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5924188Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5924192Z 2025-12-04T10:11:57.5924419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5924545Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5924640Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5925017Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5925147Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5925205Z graph_break [] 2025-12-04T10:11:57.5925328Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5926032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5926103Z if out == self.unknown_value: 2025-12-04T10:11:57.5926231Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5926320Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5926443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5926796Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5926855Z graph_break [] 2025-12-04T10:11:57.5926977Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5927069Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5927188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5927534Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5927593Z graph_break [] 2025-12-04T10:11:57.5928082Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml - 2025-12-04T10:11:57.5928200Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5929522Z FAILED [0.4145s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5929526Z 2025-12-04T10:11:57.5929659Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5930177Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5930182Z 2025-12-04T10:11:57.5930340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5930447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5930562Z ================== 1 failed, 57 deselected, 2 rerun in 11.89s ================== 2025-12-04T10:11:57.5930627Z Got exit code 1 2025-12-04T10:11:57.5930690Z Retrying single test... 2025-12-04T10:11:57.5930963Z W1204 09:40:29.793000 44143 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5931417Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml 2025-12-04T10:11:57.5931515Z ============================= test session starts ============================== 2025-12-04T10:11:57.5931726Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5931825Z cachedir: .pytest_cache 2025-12-04T10:11:57.5932133Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5932216Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5932281Z configfile: pytest.ini 2025-12-04T10:11:57.5932600Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5932728Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5933298Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5933380Z Running 1 items in this shard 2025-12-04T10:11:57.5933384Z 2025-12-04T10:11:57.5934115Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:30.847224093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5934121Z 2025-12-04T10:11:57.5934421Z [W1204 09:40:39.837374438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5934424Z 2025-12-04T10:11:57.5934716Z [W1204 09:40:39.837622422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5934720Z 2025-12-04T10:11:57.5935019Z [W1204 09:40:39.843460002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5935022Z 2025-12-04T10:11:57.5935313Z [W1204 09:40:39.844035652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5935318Z 2025-12-04T10:11:57.5935664Z [W1204 09:40:39.844218305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5935667Z 2025-12-04T10:11:57.5935955Z [W1204 09:40:39.849620027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5935958Z 2025-12-04T10:11:57.5936247Z [W1204 09:40:39.850187067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5936256Z 2025-12-04T10:11:57.5936546Z [W1204 09:40:39.850368960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5936549Z 2025-12-04T10:11:57.5936633Z ('RERUN', {'yellow': True}) [10.8515s] [100%] 2025-12-04T10:11:57.5937356Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:41.027024747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5937361Z 2025-12-04T10:11:57.5937653Z [W1204 09:40:41.027609367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5937656Z 2025-12-04T10:11:57.5937948Z [W1204 09:40:41.027754250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5937952Z 2025-12-04T10:11:57.5938308Z [W1204 09:40:41.030698590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5938312Z 2025-12-04T10:11:57.5938607Z [W1204 09:40:41.031264999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5938610Z 2025-12-04T10:11:57.5938930Z [W1204 09:40:41.031402422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5938936Z 2025-12-04T10:11:57.5939229Z [W1204 09:40:41.035876348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5939237Z 2025-12-04T10:11:57.5939523Z [W1204 09:40:41.036344036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5939526Z 2025-12-04T10:11:57.5939815Z [W1204 09:40:41.036480339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5939820Z 2025-12-04T10:11:57.5939905Z ('RERUN', {'yellow': True}) [0.4156s] [100%] 2025-12-04T10:11:57.5940623Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:41.440301269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5940629Z 2025-12-04T10:11:57.5940929Z [W1204 09:40:41.440874559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5940932Z 2025-12-04T10:11:57.5941219Z [W1204 09:40:41.441018661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5941223Z 2025-12-04T10:11:57.5941519Z [W1204 09:40:41.443919671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5941522Z 2025-12-04T10:11:57.5941811Z [W1204 09:40:41.444483660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5941815Z 2025-12-04T10:11:57.5942113Z [W1204 09:40:41.444621693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5942154Z 2025-12-04T10:11:57.5942443Z [W1204 09:40:41.449076339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5942447Z 2025-12-04T10:11:57.5942739Z [W1204 09:40:41.449526457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5942748Z 2025-12-04T10:11:57.5943041Z [W1204 09:40:41.449662209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5943047Z 2025-12-04T10:11:57.5943113Z FAILED [0.4107s] [100%] 2025-12-04T10:11:57.5943116Z 2025-12-04T10:11:57.5943205Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5943501Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5943584Z Traceback (most recent call last): 2025-12-04T10:11:57.5943894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5943959Z method(*args, **kwargs) 2025-12-04T10:11:57.5944255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5944318Z method(*args, **kwargs) 2025-12-04T10:11:57.5944610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5944768Z with policy(): 2025-12-04T10:11:57.5945067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5945137Z raise RuntimeError(msg) 2025-12-04T10:11:57.5945936Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5945976Z 2025-12-04T10:11:57.5946113Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5946643Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5946647Z 2025-12-04T10:11:57.5946807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5946940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5947034Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5947385Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5947517Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5947576Z graph_break [] 2025-12-04T10:11:57.5947704Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5948396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5948468Z if out == self.unknown_value: 2025-12-04T10:11:57.5948763Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5948836Z Traceback (most recent call last): 2025-12-04T10:11:57.5949141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5949244Z method(*args, **kwargs) 2025-12-04T10:11:57.5949536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5949603Z method(*args, **kwargs) 2025-12-04T10:11:57.5949892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5949951Z with policy(): 2025-12-04T10:11:57.5950248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5950315Z raise RuntimeError(msg) 2025-12-04T10:11:57.5951128Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5951135Z 2025-12-04T10:11:57.5951260Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5951784Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5951787Z 2025-12-04T10:11:57.5951942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5952067Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5952235Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5952581Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5952745Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5952806Z graph_break [] 2025-12-04T10:11:57.5952928Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5953625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5953696Z if out == self.unknown_value: 2025-12-04T10:11:57.5953818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5953917Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5954044Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5954401Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5954465Z graph_break [] 2025-12-04T10:11:57.5954549Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5954843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.5954915Z Traceback (most recent call last): 2025-12-04T10:11:57.5955217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5955281Z method(*args, **kwargs) 2025-12-04T10:11:57.5955577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5955646Z method(*args, **kwargs) 2025-12-04T10:11:57.5955933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5955995Z with policy(): 2025-12-04T10:11:57.5956336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5956402Z raise RuntimeError(msg) 2025-12-04T10:11:57.5957214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5957218Z 2025-12-04T10:11:57.5957343Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5957862Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5957871Z 2025-12-04T10:11:57.5958034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5958159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5958265Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5958612Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5958737Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5958803Z graph_break [] 2025-12-04T10:11:57.5958927Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.5959693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.5959794Z if out == self.unknown_value: 2025-12-04T10:11:57.5959970Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5960073Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5960196Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5960544Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5960603Z graph_break [] 2025-12-04T10:11:57.5960727Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5960823Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5960945Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5961284Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5961350Z graph_break [] 2025-12-04T10:11:57.5961836Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml - 2025-12-04T10:11:57.5961943Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5963228Z FAILED [0.4107s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5963232Z 2025-12-04T10:11:57.5963362Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5963920Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5963924Z 2025-12-04T10:11:57.5964082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5964186Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5964299Z ================== 1 failed, 57 deselected, 2 rerun in 11.70s ================== 2025-12-04T10:11:57.5964362Z Got exit code 1 2025-12-04T10:11:57.5964838Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.5965081Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.5965351Z W1204 09:40:48.085000 44329 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5965741Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml 2025-12-04T10:11:57.5965839Z ============================= test session starts ============================== 2025-12-04T10:11:57.5966043Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5966108Z cachedir: .pytest_cache 2025-12-04T10:11:57.5966498Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5966577Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5966650Z configfile: pytest.ini 2025-12-04T10:11:57.5966967Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5967130Z collecting ... collected 58 items / 18 deselected / 40 selected 2025-12-04T10:11:57.5967222Z stepcurrent: skipping 18 already run items. 2025-12-04T10:11:57.5967293Z Running 40 items in this shard 2025-12-04T10:11:57.5967296Z 2025-12-04T10:11:57.5967792Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9077s] [ 2%] 2025-12-04T10:11:57.5968296Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4514s] [ 2%] 2025-12-04T10:11:57.5968736Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4490s] [ 2%] 2025-12-04T10:11:57.5968740Z 2025-12-04T10:11:57.5968837Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.5969129Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5969208Z Traceback (most recent call last): 2025-12-04T10:11:57.5969514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5969577Z method(*args, **kwargs) 2025-12-04T10:11:57.5969873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5969936Z method(*args, **kwargs) 2025-12-04T10:11:57.5970224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5970291Z with policy(): 2025-12-04T10:11:57.5970761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5970920Z raise RuntimeError(msg) 2025-12-04T10:11:57.5971724Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.5971728Z 2025-12-04T10:11:57.5971856Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5972384Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5972388Z 2025-12-04T10:11:57.5972548Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5972681Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5972778Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5973129Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5973255Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5973314Z graph_break [] 2025-12-04T10:11:57.5973608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5973681Z Traceback (most recent call last): 2025-12-04T10:11:57.5974072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5974149Z method(*args, **kwargs) 2025-12-04T10:11:57.5974444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5974553Z method(*args, **kwargs) 2025-12-04T10:11:57.5974841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5974902Z with policy(): 2025-12-04T10:11:57.5975199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5975264Z raise RuntimeError(msg) 2025-12-04T10:11:57.5976071Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.5976081Z 2025-12-04T10:11:57.5976205Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5976723Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5976728Z 2025-12-04T10:11:57.5976888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5977018Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5977113Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5977457Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5977584Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5977650Z graph_break [] 2025-12-04T10:11:57.5977772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5977869Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5978036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5978377Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5978440Z graph_break [] 2025-12-04T10:11:57.5978522Z =================================== FAILURES =================================== 2025-12-04T10:11:57.5978809Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.5978892Z Traceback (most recent call last): 2025-12-04T10:11:57.5979193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5979262Z method(*args, **kwargs) 2025-12-04T10:11:57.5979549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.5979616Z method(*args, **kwargs) 2025-12-04T10:11:57.5979906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.5979965Z with policy(): 2025-12-04T10:11:57.5980260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.5980332Z raise RuntimeError(msg) 2025-12-04T10:11:57.5981204Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5981208Z 2025-12-04T10:11:57.5981338Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5981890Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5981894Z 2025-12-04T10:11:57.5982055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5982179Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5982267Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5982626Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5982757Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5982818Z graph_break [] 2025-12-04T10:11:57.5982948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5983039Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5983166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5983507Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5983570Z graph_break [] 2025-12-04T10:11:57.5983697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.5983786Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.5983910Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.5984261Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.5984320Z graph_break [] 2025-12-04T10:11:57.5984813Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml - 2025-12-04T10:11:57.5984951Z =========================== short test summary info ============================ 2025-12-04T10:11:57.5986238Z FAILED [0.4490s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.5986242Z 2025-12-04T10:11:57.5986365Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.5986878Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5986888Z 2025-12-04T10:11:57.5987041Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.5987144Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.5987264Z ================== 1 failed, 18 deselected, 2 rerun in 2.83s =================== 2025-12-04T10:11:57.5987323Z Got exit code 1 2025-12-04T10:11:57.5987386Z Retrying single test... 2025-12-04T10:11:57.5987663Z W1204 09:40:57.744000 44510 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.5988121Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml 2025-12-04T10:11:57.5988229Z ============================= test session starts ============================== 2025-12-04T10:11:57.5988443Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.5988544Z cachedir: .pytest_cache 2025-12-04T10:11:57.5988858Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.5988933Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.5988997Z configfile: pytest.ini 2025-12-04T10:11:57.5989316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.5989445Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.5990024Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.5990096Z Running 1 items in this shard 2025-12-04T10:11:57.5990100Z 2025-12-04T10:11:57.5990835Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:40:59.021696814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5990841Z 2025-12-04T10:11:57.5991138Z [W1204 09:41:08.316248202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5991142Z 2025-12-04T10:11:57.5991435Z [W1204 09:41:08.316516437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5991446Z 2025-12-04T10:11:57.5991734Z [W1204 09:41:08.322455048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5991737Z 2025-12-04T10:11:57.5992025Z [W1204 09:41:08.323064639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5992065Z 2025-12-04T10:11:57.5992356Z [W1204 09:41:08.323240092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5992359Z 2025-12-04T10:11:57.5992647Z [W1204 09:41:08.328750066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5992649Z 2025-12-04T10:11:57.5992939Z [W1204 09:41:08.329296635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5992943Z 2025-12-04T10:11:57.5993231Z [W1204 09:41:08.329485538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5993235Z 2025-12-04T10:11:57.5993323Z ('RERUN', {'yellow': True}) [11.2000s] [100%] 2025-12-04T10:11:57.5994046Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:09.326363869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5994051Z 2025-12-04T10:11:57.5994342Z [W1204 09:41:09.326925339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5994345Z 2025-12-04T10:11:57.5994634Z [W1204 09:41:09.327066121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5994638Z 2025-12-04T10:11:57.5994991Z [W1204 09:41:09.329945051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5994995Z 2025-12-04T10:11:57.5995289Z [W1204 09:41:09.330521500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5995331Z 2025-12-04T10:11:57.5995617Z [W1204 09:41:09.330664423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5995620Z 2025-12-04T10:11:57.5995913Z [W1204 09:41:09.335123039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5995916Z 2025-12-04T10:11:57.5996203Z [W1204 09:41:09.335576517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5996207Z 2025-12-04T10:11:57.5996499Z [W1204 09:41:09.335714099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5996502Z 2025-12-04T10:11:57.5996580Z ('RERUN', {'yellow': True}) [0.4173s] [100%] 2025-12-04T10:11:57.5997306Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:09.740855041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5997313Z 2025-12-04T10:11:57.5997600Z [W1204 09:41:09.741404211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5997603Z 2025-12-04T10:11:57.5997901Z [W1204 09:41:09.741548003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5997908Z 2025-12-04T10:11:57.5998197Z [W1204 09:41:09.744401562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5998200Z 2025-12-04T10:11:57.5998488Z [W1204 09:41:09.744941411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5998548Z 2025-12-04T10:11:57.5998842Z [W1204 09:41:09.745080473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5998845Z 2025-12-04T10:11:57.5999133Z [W1204 09:41:09.749483278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5999136Z 2025-12-04T10:11:57.5999429Z [W1204 09:41:09.749935466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5999432Z 2025-12-04T10:11:57.5999722Z [W1204 09:41:09.750092239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.5999725Z 2025-12-04T10:11:57.5999793Z FAILED [0.4143s] [100%] 2025-12-04T10:11:57.5999796Z 2025-12-04T10:11:57.5999959Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6000266Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6000346Z Traceback (most recent call last): 2025-12-04T10:11:57.6000654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6000724Z method(*args, **kwargs) 2025-12-04T10:11:57.6001021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6001085Z method(*args, **kwargs) 2025-12-04T10:11:57.6001454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6001516Z with policy(): 2025-12-04T10:11:57.6001809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6001918Z raise RuntimeError(msg) 2025-12-04T10:11:57.6002711Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6002715Z 2025-12-04T10:11:57.6002848Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6003371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6003376Z 2025-12-04T10:11:57.6003537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6003666Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6003762Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6004117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6004246Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6004321Z graph_break [] 2025-12-04T10:11:57.6004449Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6005143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6005216Z if out == self.unknown_value: 2025-12-04T10:11:57.6005506Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6005627Z Traceback (most recent call last): 2025-12-04T10:11:57.6005940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6006003Z method(*args, **kwargs) 2025-12-04T10:11:57.6006295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6006356Z method(*args, **kwargs) 2025-12-04T10:11:57.6006644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6006708Z with policy(): 2025-12-04T10:11:57.6007001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6007066Z raise RuntimeError(msg) 2025-12-04T10:11:57.6007879Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6007886Z 2025-12-04T10:11:57.6008010Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6008535Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6008538Z 2025-12-04T10:11:57.6008758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6008889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6008981Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6009325Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6009571Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6009629Z graph_break [] 2025-12-04T10:11:57.6009753Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6010445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6010513Z if out == self.unknown_value: 2025-12-04T10:11:57.6010638Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6010730Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6010851Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6011209Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6011271Z graph_break [] 2025-12-04T10:11:57.6011360Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6011645Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6011717Z Traceback (most recent call last): 2025-12-04T10:11:57.6012018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6012083Z method(*args, **kwargs) 2025-12-04T10:11:57.6012379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6012441Z method(*args, **kwargs) 2025-12-04T10:11:57.6012725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6012827Z with policy(): 2025-12-04T10:11:57.6013120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6013185Z raise RuntimeError(msg) 2025-12-04T10:11:57.6014000Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6014006Z 2025-12-04T10:11:57.6014129Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6014653Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6014659Z 2025-12-04T10:11:57.6014813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6014939Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6015030Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6015371Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6015496Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6015556Z graph_break [] 2025-12-04T10:11:57.6015742Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6016433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6016535Z if out == self.unknown_value: 2025-12-04T10:11:57.6016659Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6016747Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6016869Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6017404Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6017464Z graph_break [] 2025-12-04T10:11:57.6017592Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6017678Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6017799Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6018151Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6018211Z graph_break [] 2025-12-04T10:11:57.6018703Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml - 2025-12-04T10:11:57.6018799Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6020083Z FAILED [0.4143s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6020162Z 2025-12-04T10:11:57.6020290Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6020803Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6020807Z 2025-12-04T10:11:57.6020964Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6021067Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6021199Z ================== 1 failed, 57 deselected, 2 rerun in 12.05s ================== 2025-12-04T10:11:57.6021261Z Got exit code 1 2025-12-04T10:11:57.6021331Z Retrying single test... 2025-12-04T10:11:57.6021605Z W1204 09:41:16.376000 44696 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6021999Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml 2025-12-04T10:11:57.6022098Z ============================= test session starts ============================== 2025-12-04T10:11:57.6022309Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6022373Z cachedir: .pytest_cache 2025-12-04T10:11:57.6022681Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6022756Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6022822Z configfile: pytest.ini 2025-12-04T10:11:57.6023237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6023369Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6023945Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6024084Z Running 1 items in this shard 2025-12-04T10:11:57.6024088Z 2025-12-04T10:11:57.6024813Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:17.646103076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6024818Z 2025-12-04T10:11:57.6025134Z [W1204 09:41:26.789116142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6025138Z 2025-12-04T10:11:57.6025426Z [W1204 09:41:26.789372466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6025431Z 2025-12-04T10:11:57.6025726Z [W1204 09:41:26.795166008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6025730Z 2025-12-04T10:11:57.6026015Z [W1204 09:41:26.795757888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6026018Z 2025-12-04T10:11:57.6026308Z [W1204 09:41:26.795930621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6026312Z 2025-12-04T10:11:57.6026601Z [W1204 09:41:26.801429498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6026604Z 2025-12-04T10:11:57.6026896Z [W1204 09:41:26.801977777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6026900Z 2025-12-04T10:11:57.6027190Z [W1204 09:41:26.802167811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6027231Z 2025-12-04T10:11:57.6027313Z ('RERUN', {'yellow': True}) [11.0362s] [100%] 2025-12-04T10:11:57.6028044Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:27.792403230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6028048Z 2025-12-04T10:11:57.6028337Z [W1204 09:41:27.792964500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6028340Z 2025-12-04T10:11:57.6028631Z [W1204 09:41:27.793105092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6028636Z 2025-12-04T10:11:57.6028923Z [W1204 09:41:27.796037163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6028925Z 2025-12-04T10:11:57.6029215Z [W1204 09:41:27.796603803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6029218Z 2025-12-04T10:11:57.6029508Z [W1204 09:41:27.796742596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6029511Z 2025-12-04T10:11:57.6029870Z [W1204 09:41:27.801273544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6029874Z 2025-12-04T10:11:57.6030163Z [W1204 09:41:27.801735372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6030166Z 2025-12-04T10:11:57.6030494Z [W1204 09:41:27.801872205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6030499Z 2025-12-04T10:11:57.6030577Z ('RERUN', {'yellow': True}) [0.4147s] [100%] 2025-12-04T10:11:57.6031292Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:28.205184171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6031301Z 2025-12-04T10:11:57.6031593Z [W1204 09:41:28.205745750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6031596Z 2025-12-04T10:11:57.6031881Z [W1204 09:41:28.205887213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6031884Z 2025-12-04T10:11:57.6032173Z [W1204 09:41:28.208811654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6032178Z 2025-12-04T10:11:57.6032463Z [W1204 09:41:28.209359463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6032466Z 2025-12-04T10:11:57.6032758Z [W1204 09:41:28.209499556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6032762Z 2025-12-04T10:11:57.6033054Z [W1204 09:41:28.213936193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6033057Z 2025-12-04T10:11:57.6033344Z [W1204 09:41:28.214386031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6033347Z 2025-12-04T10:11:57.6033633Z [W1204 09:41:28.214522374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6033673Z 2025-12-04T10:11:57.6033734Z FAILED [0.4118s] [100%] 2025-12-04T10:11:57.6033742Z 2025-12-04T10:11:57.6033823Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6034115Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6034192Z Traceback (most recent call last): 2025-12-04T10:11:57.6034498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6034564Z method(*args, **kwargs) 2025-12-04T10:11:57.6034860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6034924Z method(*args, **kwargs) 2025-12-04T10:11:57.6035217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6035277Z with policy(): 2025-12-04T10:11:57.6035568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6035637Z raise RuntimeError(msg) 2025-12-04T10:11:57.6036429Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6036504Z 2025-12-04T10:11:57.6036645Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6037173Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6037211Z 2025-12-04T10:11:57.6037367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6037501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6037593Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6037954Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6038080Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6038138Z graph_break [] 2025-12-04T10:11:57.6038269Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6038964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6039041Z if out == self.unknown_value: 2025-12-04T10:11:57.6039331Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6039407Z Traceback (most recent call last): 2025-12-04T10:11:57.6039719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6039781Z method(*args, **kwargs) 2025-12-04T10:11:57.6040122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6040193Z method(*args, **kwargs) 2025-12-04T10:11:57.6040481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6040543Z with policy(): 2025-12-04T10:11:57.6040838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6040943Z raise RuntimeError(msg) 2025-12-04T10:11:57.6041753Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6041757Z 2025-12-04T10:11:57.6041883Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6042405Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6042409Z 2025-12-04T10:11:57.6042563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6042691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6042788Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6043148Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6043282Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6043339Z graph_break [] 2025-12-04T10:11:57.6043460Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6044220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6044289Z if out == self.unknown_value: 2025-12-04T10:11:57.6044454Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6044546Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6044669Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6045014Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6045072Z graph_break [] 2025-12-04T10:11:57.6045155Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6045454Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6045527Z Traceback (most recent call last): 2025-12-04T10:11:57.6045826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6045902Z method(*args, **kwargs) 2025-12-04T10:11:57.6046196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6046264Z method(*args, **kwargs) 2025-12-04T10:11:57.6046550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6046614Z with policy(): 2025-12-04T10:11:57.6046904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6046970Z raise RuntimeError(msg) 2025-12-04T10:11:57.6047783Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6047788Z 2025-12-04T10:11:57.6047953Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6048471Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6048475Z 2025-12-04T10:11:57.6048628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6048751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6048845Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6049190Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6049316Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6049376Z graph_break [] 2025-12-04T10:11:57.6049503Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6050194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6050262Z if out == self.unknown_value: 2025-12-04T10:11:57.6050387Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6050476Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6050597Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6051031Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6051102Z graph_break [] 2025-12-04T10:11:57.6051258Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6051352Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6051473Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6051820Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6051880Z graph_break [] 2025-12-04T10:11:57.6052369Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml - 2025-12-04T10:11:57.6052476Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6053759Z FAILED [0.4118s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6053766Z 2025-12-04T10:11:57.6053893Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6054412Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6054416Z 2025-12-04T10:11:57.6054573Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6054676Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6054791Z ================== 1 failed, 57 deselected, 2 rerun in 11.89s ================== 2025-12-04T10:11:57.6054895Z Got exit code 1 2025-12-04T10:11:57.6055366Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6055614Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6055877Z W1204 09:41:34.834000 44882 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6056264Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml 2025-12-04T10:11:57.6056366Z ============================= test session starts ============================== 2025-12-04T10:11:57.6056574Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6056643Z cachedir: .pytest_cache 2025-12-04T10:11:57.6056952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6057028Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6057095Z configfile: pytest.ini 2025-12-04T10:11:57.6057406Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6057535Z collecting ... collected 58 items / 19 deselected / 39 selected 2025-12-04T10:11:57.6057628Z stepcurrent: skipping 19 already run items. 2025-12-04T10:11:57.6057696Z Running 39 items in this shard 2025-12-04T10:11:57.6057699Z 2025-12-04T10:11:57.6058279Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8837s] [ 2%] 2025-12-04T10:11:57.6058774Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4820s] [ 2%] 2025-12-04T10:11:57.6059255Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4932s] [ 2%] 2025-12-04T10:11:57.6059262Z 2025-12-04T10:11:57.6059348Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6059643Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6059723Z Traceback (most recent call last): 2025-12-04T10:11:57.6060033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6060109Z method(*args, **kwargs) 2025-12-04T10:11:57.6060415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6060482Z method(*args, **kwargs) 2025-12-04T10:11:57.6060775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6060833Z with policy(): 2025-12-04T10:11:57.6061124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6061192Z raise RuntimeError(msg) 2025-12-04T10:11:57.6062005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6062009Z 2025-12-04T10:11:57.6062146Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6062707Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6062711Z 2025-12-04T10:11:57.6062868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6063001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6063095Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6063453Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6063580Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6063638Z graph_break [] 2025-12-04T10:11:57.6063935Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6064012Z Traceback (most recent call last): 2025-12-04T10:11:57.6064311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6064375Z method(*args, **kwargs) 2025-12-04T10:11:57.6064665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6064732Z method(*args, **kwargs) 2025-12-04T10:11:57.6065019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6065078Z with policy(): 2025-12-04T10:11:57.6065445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6065511Z raise RuntimeError(msg) 2025-12-04T10:11:57.6066332Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6066370Z 2025-12-04T10:11:57.6066497Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6067020Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6067024Z 2025-12-04T10:11:57.6067184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6067311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6067409Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6067756Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6067896Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6067953Z graph_break [] 2025-12-04T10:11:57.6068077Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6068169Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6068300Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6068648Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6068715Z graph_break [] 2025-12-04T10:11:57.6068797Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6069092Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6069206Z Traceback (most recent call last): 2025-12-04T10:11:57.6069503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6069576Z method(*args, **kwargs) 2025-12-04T10:11:57.6069866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6069929Z method(*args, **kwargs) 2025-12-04T10:11:57.6070221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6070282Z with policy(): 2025-12-04T10:11:57.6070579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6070643Z raise RuntimeError(msg) 2025-12-04T10:11:57.6071461Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6071468Z 2025-12-04T10:11:57.6071597Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6072117Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6072121Z 2025-12-04T10:11:57.6072345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6072472Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6072561Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6072937Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6073061Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6073120Z graph_break [] 2025-12-04T10:11:57.6073255Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6073347Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6073470Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6073808Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6073871Z graph_break [] 2025-12-04T10:11:57.6073991Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6074079Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6074208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6074549Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6074606Z graph_break [] 2025-12-04T10:11:57.6075100Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml - 2025-12-04T10:11:57.6075199Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6076502Z FAILED [0.4932s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6076546Z 2025-12-04T10:11:57.6076670Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6077192Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6077197Z 2025-12-04T10:11:57.6077349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6077454Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6077571Z ================== 1 failed, 19 deselected, 2 rerun in 2.88s =================== 2025-12-04T10:11:57.6077631Z Got exit code 1 2025-12-04T10:11:57.6077700Z Retrying single test... 2025-12-04T10:11:57.6077966Z W1204 09:41:44.593000 45070 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6078352Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml 2025-12-04T10:11:57.6078453Z ============================= test session starts ============================== 2025-12-04T10:11:57.6078659Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6078723Z cachedir: .pytest_cache 2025-12-04T10:11:57.6079124Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6079205Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6079275Z configfile: pytest.ini 2025-12-04T10:11:57.6079593Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6079757Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6080379Z stepcurrent: skipping 19 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6080451Z Running 1 items in this shard 2025-12-04T10:11:57.6080454Z 2025-12-04T10:11:57.6081192Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:41:45.673852341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6081196Z 2025-12-04T10:11:57.6081493Z [W1204 09:41:54.809742327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6081499Z 2025-12-04T10:11:57.6081792Z [W1204 09:41:54.809986691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6081797Z 2025-12-04T10:11:57.6082087Z [W1204 09:41:54.815805031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6082091Z 2025-12-04T10:11:57.6082382Z [W1204 09:41:54.816403531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6082386Z 2025-12-04T10:11:57.6082675Z [W1204 09:41:54.816586214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6082679Z 2025-12-04T10:11:57.6082964Z [W1204 09:41:54.822002736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6082967Z 2025-12-04T10:11:57.6083258Z [W1204 09:41:54.822535746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6083301Z 2025-12-04T10:11:57.6083588Z [W1204 09:41:54.822698198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6083592Z 2025-12-04T10:11:57.6083678Z ('RERUN', {'yellow': True}) [11.0162s] [100%] 2025-12-04T10:11:57.6084403Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:41:56.025971803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6084407Z 2025-12-04T10:11:57.6084703Z [W1204 09:41:56.026497112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6084710Z 2025-12-04T10:11:57.6085001Z [W1204 09:41:56.026641925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6085006Z 2025-12-04T10:11:57.6085297Z [W1204 09:41:56.029614446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6085300Z 2025-12-04T10:11:57.6085586Z [W1204 09:41:56.030200525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6085589Z 2025-12-04T10:11:57.6085876Z [W1204 09:41:56.030342298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6085949Z 2025-12-04T10:11:57.6086237Z [W1204 09:41:56.034936017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6086240Z 2025-12-04T10:11:57.6086526Z [W1204 09:41:56.035392124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6086577Z 2025-12-04T10:11:57.6086868Z [W1204 09:41:56.035528347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6086871Z 2025-12-04T10:11:57.6086949Z ('RERUN', {'yellow': True}) [0.4493s] [100%] 2025-12-04T10:11:57.6087681Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:41:56.473748002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6087687Z 2025-12-04T10:11:57.6087974Z [W1204 09:41:56.474278791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6087977Z 2025-12-04T10:11:57.6088266Z [W1204 09:41:56.474419483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6088272Z 2025-12-04T10:11:57.6088556Z [W1204 09:41:56.477410564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6088560Z 2025-12-04T10:11:57.6088848Z [W1204 09:41:56.477974154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6088851Z 2025-12-04T10:11:57.6089137Z [W1204 09:41:56.478116326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6089143Z 2025-12-04T10:11:57.6089429Z [W1204 09:41:56.482739395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6089439Z 2025-12-04T10:11:57.6089733Z [W1204 09:41:56.483203563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6089773Z 2025-12-04T10:11:57.6090060Z [W1204 09:41:56.483345405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6090063Z 2025-12-04T10:11:57.6090128Z FAILED [0.4462s] [100%] 2025-12-04T10:11:57.6090132Z 2025-12-04T10:11:57.6090214Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6090516Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6090590Z Traceback (most recent call last): 2025-12-04T10:11:57.6090896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6090965Z method(*args, **kwargs) 2025-12-04T10:11:57.6091258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6091323Z method(*args, **kwargs) 2025-12-04T10:11:57.6091621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6091679Z with policy(): 2025-12-04T10:11:57.6091975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6092039Z raise RuntimeError(msg) 2025-12-04T10:11:57.6092913Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6092925Z 2025-12-04T10:11:57.6093056Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6093613Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6093617Z 2025-12-04T10:11:57.6093781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6093908Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6094005Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6094355Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6094486Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6094549Z graph_break [] 2025-12-04T10:11:57.6094671Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6095378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6095451Z if out == self.unknown_value: 2025-12-04T10:11:57.6095743Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6095822Z Traceback (most recent call last): 2025-12-04T10:11:57.6096122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6096189Z method(*args, **kwargs) 2025-12-04T10:11:57.6096481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6096542Z method(*args, **kwargs) 2025-12-04T10:11:57.6096835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6096935Z with policy(): 2025-12-04T10:11:57.6097226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6097295Z raise RuntimeError(msg) 2025-12-04T10:11:57.6098114Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6098120Z 2025-12-04T10:11:57.6098253Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6098775Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6098782Z 2025-12-04T10:11:57.6098944Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6099068Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6099160Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6099526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6099654Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6099712Z graph_break [] 2025-12-04T10:11:57.6099906Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6100595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6100704Z if out == self.unknown_value: 2025-12-04T10:11:57.6100825Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6100913Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6101035Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6101375Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6101435Z graph_break [] 2025-12-04T10:11:57.6101519Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6101809Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6101883Z Traceback (most recent call last): 2025-12-04T10:11:57.6102187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6102250Z method(*args, **kwargs) 2025-12-04T10:11:57.6102543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6102609Z method(*args, **kwargs) 2025-12-04T10:11:57.6102898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6102958Z with policy(): 2025-12-04T10:11:57.6103251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6103321Z raise RuntimeError(msg) 2025-12-04T10:11:57.6104143Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6104202Z 2025-12-04T10:11:57.6104330Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6104849Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6104853Z 2025-12-04T10:11:57.6105006Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6105134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6105223Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6105567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6105691Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6105747Z graph_break [] 2025-12-04T10:11:57.6105873Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6106555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6106634Z if out == self.unknown_value: 2025-12-04T10:11:57.6106832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6106921Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6107046Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6107383Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6107476Z graph_break [] 2025-12-04T10:11:57.6107603Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6107691Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6107813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6108155Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6108213Z graph_break [] 2025-12-04T10:11:57.6108702Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml - 2025-12-04T10:11:57.6108801Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6110104Z FAILED [0.4462s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6110110Z 2025-12-04T10:11:57.6110236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6110760Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6110764Z 2025-12-04T10:11:57.6110918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6111058Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6111178Z ================== 1 failed, 57 deselected, 2 rerun in 11.94s ================== 2025-12-04T10:11:57.6111237Z Got exit code 1 2025-12-04T10:11:57.6111307Z Retrying single test... 2025-12-04T10:11:57.6111572Z W1204 09:42:03.120000 45263 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6111955Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml 2025-12-04T10:11:57.6112058Z ============================= test session starts ============================== 2025-12-04T10:11:57.6112270Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6112334Z cachedir: .pytest_cache 2025-12-04T10:11:57.6112643Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6112721Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6112789Z configfile: pytest.ini 2025-12-04T10:11:57.6113102Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6113229Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6113805Z stepcurrent: skipping 19 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6113948Z Running 1 items in this shard 2025-12-04T10:11:57.6113951Z 2025-12-04T10:11:57.6114686Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:42:04.235878468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6114724Z 2025-12-04T10:11:57.6115024Z [W1204 09:42:13.430814353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6115027Z 2025-12-04T10:11:57.6115321Z [W1204 09:42:13.431069627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6115324Z 2025-12-04T10:11:57.6115613Z [W1204 09:42:13.436926356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6115617Z 2025-12-04T10:11:57.6115906Z [W1204 09:42:13.437513156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6115913Z 2025-12-04T10:11:57.6116200Z [W1204 09:42:13.437683089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6116206Z 2025-12-04T10:11:57.6116490Z [W1204 09:42:13.443252303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6116493Z 2025-12-04T10:11:57.6116784Z [W1204 09:42:13.443782781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6116787Z 2025-12-04T10:11:57.6117223Z [W1204 09:42:13.443940394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6117227Z 2025-12-04T10:11:57.6117321Z ('RERUN', {'yellow': True}) [11.1149s] [100%] 2025-12-04T10:11:57.6118052Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:42:14.656281917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6118139Z 2025-12-04T10:11:57.6118434Z [W1204 09:42:14.656872107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6118438Z 2025-12-04T10:11:57.6118725Z [W1204 09:42:14.657029119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6118728Z 2025-12-04T10:11:57.6119017Z [W1204 09:42:14.659964938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6119022Z 2025-12-04T10:11:57.6119312Z [W1204 09:42:14.660562619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6119315Z 2025-12-04T10:11:57.6119599Z [W1204 09:42:14.660711821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6119609Z 2025-12-04T10:11:57.6119927Z [W1204 09:42:14.665246097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6119930Z 2025-12-04T10:11:57.6120217Z [W1204 09:42:14.665717705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6120220Z 2025-12-04T10:11:57.6120511Z [W1204 09:42:14.665879758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6120514Z 2025-12-04T10:11:57.6120773Z ('RERUN', {'yellow': True}) [0.4554s] [100%] 2025-12-04T10:11:57.6121510Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:42:15.108945923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6121561Z 2025-12-04T10:11:57.6121853Z [W1204 09:42:15.109473212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6121856Z 2025-12-04T10:11:57.6122149Z [W1204 09:42:15.109616114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6122152Z 2025-12-04T10:11:57.6122436Z [W1204 09:42:15.112530143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6122439Z 2025-12-04T10:11:57.6122733Z [W1204 09:42:15.113085133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6122740Z 2025-12-04T10:11:57.6123025Z [W1204 09:42:15.113223065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6123032Z 2025-12-04T10:11:57.6123317Z [W1204 09:42:15.117642700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6123320Z 2025-12-04T10:11:57.6123608Z [W1204 09:42:15.118097128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6123611Z 2025-12-04T10:11:57.6123895Z [W1204 09:42:15.118232350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6123898Z 2025-12-04T10:11:57.6123966Z FAILED [0.4464s] [100%] 2025-12-04T10:11:57.6123969Z 2025-12-04T10:11:57.6124055Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6124353Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6124428Z Traceback (most recent call last): 2025-12-04T10:11:57.6124770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6124838Z method(*args, **kwargs) 2025-12-04T10:11:57.6125130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6125192Z method(*args, **kwargs) 2025-12-04T10:11:57.6125485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6125544Z with policy(): 2025-12-04T10:11:57.6125842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6125906Z raise RuntimeError(msg) 2025-12-04T10:11:57.6126718Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6126725Z 2025-12-04T10:11:57.6126854Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6127376Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6127379Z 2025-12-04T10:11:57.6127542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6127733Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6127827Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6128182Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6128361Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6128425Z graph_break [] 2025-12-04T10:11:57.6128548Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6129241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6129317Z if out == self.unknown_value: 2025-12-04T10:11:57.6129608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6129684Z Traceback (most recent call last): 2025-12-04T10:11:57.6129979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6130044Z method(*args, **kwargs) 2025-12-04T10:11:57.6130337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6130399Z method(*args, **kwargs) 2025-12-04T10:11:57.6130686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6130749Z with policy(): 2025-12-04T10:11:57.6131040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6131111Z raise RuntimeError(msg) 2025-12-04T10:11:57.6131935Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6131977Z 2025-12-04T10:11:57.6132107Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6132630Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6132633Z 2025-12-04T10:11:57.6132788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6132917Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6133008Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6133364Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6133490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6133550Z graph_break [] 2025-12-04T10:11:57.6133675Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6134362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6134434Z if out == self.unknown_value: 2025-12-04T10:11:57.6134563Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6134652Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6134849Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6135193Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6135283Z graph_break [] 2025-12-04T10:11:57.6135371Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6135659Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6135735Z Traceback (most recent call last): 2025-12-04T10:11:57.6136043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6136111Z method(*args, **kwargs) 2025-12-04T10:11:57.6136407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6136473Z method(*args, **kwargs) 2025-12-04T10:11:57.6136758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6136821Z with policy(): 2025-12-04T10:11:57.6137111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6137185Z raise RuntimeError(msg) 2025-12-04T10:11:57.6138006Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6138010Z 2025-12-04T10:11:57.6138132Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6138661Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6138665Z 2025-12-04T10:11:57.6138819Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6138987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6139078Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6139417Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6139544Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6139602Z graph_break [] 2025-12-04T10:11:57.6139729Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6140417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6140486Z if out == self.unknown_value: 2025-12-04T10:11:57.6140615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6140706Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6140830Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6141168Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6141226Z graph_break [] 2025-12-04T10:11:57.6141352Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6141437Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6141631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6141977Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6142068Z graph_break [] 2025-12-04T10:11:57.6142564Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml - 2025-12-04T10:11:57.6142664Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6143966Z FAILED [0.4464s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6143971Z 2025-12-04T10:11:57.6144091Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6144611Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6144621Z 2025-12-04T10:11:57.6144777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6144878Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6144996Z ================== 1 failed, 57 deselected, 2 rerun in 12.04s ================== 2025-12-04T10:11:57.6145054Z Got exit code 1 2025-12-04T10:11:57.6145529Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6145779Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6146044Z W1204 09:42:21.748000 45456 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6146472Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml 2025-12-04T10:11:57.6146567Z ============================= test session starts ============================== 2025-12-04T10:11:57.6146773Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6146857Z cachedir: .pytest_cache 2025-12-04T10:11:57.6147163Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6147244Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6147308Z configfile: pytest.ini 2025-12-04T10:11:57.6147624Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6147755Z collecting ... collected 58 items / 20 deselected / 38 selected 2025-12-04T10:11:57.6147843Z stepcurrent: skipping 20 already run items. 2025-12-04T10:11:57.6147912Z Running 38 items in this shard 2025-12-04T10:11:57.6147916Z 2025-12-04T10:11:57.6148412Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8691s] [ 2%] 2025-12-04T10:11:57.6148892Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4514s] [ 2%] 2025-12-04T10:11:57.6149404Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4366s] [ 2%] 2025-12-04T10:11:57.6149409Z 2025-12-04T10:11:57.6149493Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6149826Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6149899Z Traceback (most recent call last): 2025-12-04T10:11:57.6150199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6150268Z method(*args, **kwargs) 2025-12-04T10:11:57.6150557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6150621Z method(*args, **kwargs) 2025-12-04T10:11:57.6150914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6150973Z with policy(): 2025-12-04T10:11:57.6151273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6151341Z raise RuntimeError(msg) 2025-12-04T10:11:57.6152131Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6152139Z 2025-12-04T10:11:57.6152265Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6152784Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6152787Z 2025-12-04T10:11:57.6152947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6153072Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6153171Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6153557Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6153697Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6153762Z graph_break [] 2025-12-04T10:11:57.6154050Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6154121Z Traceback (most recent call last): 2025-12-04T10:11:57.6154427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6154491Z method(*args, **kwargs) 2025-12-04T10:11:57.6154791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6154855Z method(*args, **kwargs) 2025-12-04T10:11:57.6155148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6155212Z with policy(): 2025-12-04T10:11:57.6155506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6155572Z raise RuntimeError(msg) 2025-12-04T10:11:57.6156470Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6156475Z 2025-12-04T10:11:57.6156610Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6157129Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6157171Z 2025-12-04T10:11:57.6157327Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6157457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6157551Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6157896Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6158033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6158093Z graph_break [] 2025-12-04T10:11:57.6158218Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6158310Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6158442Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6158791Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6158847Z graph_break [] 2025-12-04T10:11:57.6158929Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6159222Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6159294Z Traceback (most recent call last): 2025-12-04T10:11:57.6159599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6159662Z method(*args, **kwargs) 2025-12-04T10:11:57.6160002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6160072Z method(*args, **kwargs) 2025-12-04T10:11:57.6160406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6160466Z with policy(): 2025-12-04T10:11:57.6160765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6160833Z raise RuntimeError(msg) 2025-12-04T10:11:57.6161646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6161650Z 2025-12-04T10:11:57.6161772Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6162292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6162299Z 2025-12-04T10:11:57.6162452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6162575Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6162672Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6163009Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6163209Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6163270Z graph_break [] 2025-12-04T10:11:57.6163394Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6163482Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6163638Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6163979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6164039Z graph_break [] 2025-12-04T10:11:57.6164160Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6164250Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6164371Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6164711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6164772Z graph_break [] 2025-12-04T10:11:57.6165258Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml - 2025-12-04T10:11:57.6165358Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6166644Z FAILED [0.4366s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6166648Z 2025-12-04T10:11:57.6166772Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6167288Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6167329Z 2025-12-04T10:11:57.6167479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6167586Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6167703Z ================== 1 failed, 20 deselected, 2 rerun in 2.78s =================== 2025-12-04T10:11:57.6167761Z Got exit code 1 2025-12-04T10:11:57.6167828Z Retrying single test... 2025-12-04T10:11:57.6168090Z W1204 09:42:31.462000 45637 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6168482Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml 2025-12-04T10:11:57.6168576Z ============================= test session starts ============================== 2025-12-04T10:11:57.6168780Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6168864Z cachedir: .pytest_cache 2025-12-04T10:11:57.6169170Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6169251Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6169315Z configfile: pytest.ini 2025-12-04T10:11:57.6169629Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6169760Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6170394Z stepcurrent: skipping 20 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6170465Z Running 1 items in this shard 2025-12-04T10:11:57.6170474Z 2025-12-04T10:11:57.6171197Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:32.515644840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6171237Z 2025-12-04T10:11:57.6171537Z [W1204 09:42:41.621200903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6171540Z 2025-12-04T10:11:57.6171835Z [W1204 09:42:41.621457927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6171838Z 2025-12-04T10:11:57.6172128Z [W1204 09:42:41.627336527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6172131Z 2025-12-04T10:11:57.6172422Z [W1204 09:42:41.627913067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6172428Z 2025-12-04T10:11:57.6172713Z [W1204 09:42:41.628091010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6172716Z 2025-12-04T10:11:57.6173005Z [W1204 09:42:41.633645025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6173008Z 2025-12-04T10:11:57.6173299Z [W1204 09:42:41.634200354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6173303Z 2025-12-04T10:11:57.6173595Z [W1204 09:42:41.634377008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6173598Z 2025-12-04T10:11:57.6173678Z ('RERUN', {'yellow': True}) [10.9655s] [100%] 2025-12-04T10:11:57.6174396Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:42.808326756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6174441Z 2025-12-04T10:11:57.6174730Z [W1204 09:42:42.808905816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6174733Z 2025-12-04T10:11:57.6175022Z [W1204 09:42:42.809051988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6175025Z 2025-12-04T10:11:57.6175317Z [W1204 09:42:42.812024919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6175321Z 2025-12-04T10:11:57.6175606Z [W1204 09:42:42.812598389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6175612Z 2025-12-04T10:11:57.6175901Z [W1204 09:42:42.812738631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6175904Z 2025-12-04T10:11:57.6176190Z [W1204 09:42:42.817269319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6176194Z 2025-12-04T10:11:57.6176487Z [W1204 09:42:42.817727867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6176491Z 2025-12-04T10:11:57.6176842Z [W1204 09:42:42.817865879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6176846Z 2025-12-04T10:11:57.6176930Z ('RERUN', {'yellow': True}) [0.4159s] [100%] 2025-12-04T10:11:57.6177642Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:43.222606118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6177681Z 2025-12-04T10:11:57.6177965Z [W1204 09:42:43.223180637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6177968Z 2025-12-04T10:11:57.6178260Z [W1204 09:42:43.223328410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6178263Z 2025-12-04T10:11:57.6178556Z [W1204 09:42:43.226272680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6178559Z 2025-12-04T10:11:57.6178849Z [W1204 09:42:43.226827100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6178853Z 2025-12-04T10:11:57.6179142Z [W1204 09:42:43.226969232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6179146Z 2025-12-04T10:11:57.6179434Z [W1204 09:42:43.231474489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6179437Z 2025-12-04T10:11:57.6179735Z [W1204 09:42:43.231934857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6179738Z 2025-12-04T10:11:57.6180034Z [W1204 09:42:43.232073049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6180037Z 2025-12-04T10:11:57.6180097Z FAILED [0.4135s] [100%] 2025-12-04T10:11:57.6180101Z 2025-12-04T10:11:57.6180184Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6180494Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6180606Z Traceback (most recent call last): 2025-12-04T10:11:57.6180914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6180979Z method(*args, **kwargs) 2025-12-04T10:11:57.6181480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6187853Z method(*args, **kwargs) 2025-12-04T10:11:57.6188257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6188327Z with policy(): 2025-12-04T10:11:57.6188655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6188736Z raise RuntimeError(msg) 2025-12-04T10:11:57.6189557Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6189562Z 2025-12-04T10:11:57.6189704Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6190239Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6190364Z 2025-12-04T10:11:57.6190542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6190681Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6190783Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6191184Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6191321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6191386Z graph_break [] 2025-12-04T10:11:57.6191517Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6192226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6192312Z if out == self.unknown_value: 2025-12-04T10:11:57.6192619Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6192704Z Traceback (most recent call last): 2025-12-04T10:11:57.6193013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6193081Z method(*args, **kwargs) 2025-12-04T10:11:57.6193378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6193443Z method(*args, **kwargs) 2025-12-04T10:11:57.6193730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6193796Z with policy(): 2025-12-04T10:11:57.6194093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6194164Z raise RuntimeError(msg) 2025-12-04T10:11:57.6194978Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6195024Z 2025-12-04T10:11:57.6195160Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6195689Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6195693Z 2025-12-04T10:11:57.6195856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6195994Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6196093Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6196442Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6196581Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6196640Z graph_break [] 2025-12-04T10:11:57.6196770Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6197472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6197544Z if out == self.unknown_value: 2025-12-04T10:11:57.6197741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6197834Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6197960Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6198304Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6198397Z graph_break [] 2025-12-04T10:11:57.6198484Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6198779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6198857Z Traceback (most recent call last): 2025-12-04T10:11:57.6199161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6199227Z method(*args, **kwargs) 2025-12-04T10:11:57.6199523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6199587Z method(*args, **kwargs) 2025-12-04T10:11:57.6199963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6200033Z with policy(): 2025-12-04T10:11:57.6200330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6200398Z raise RuntimeError(msg) 2025-12-04T10:11:57.6201355Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6201360Z 2025-12-04T10:11:57.6201497Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6202024Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6202069Z 2025-12-04T10:11:57.6202231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6202363Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6202456Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6202815Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6202949Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6203009Z graph_break [] 2025-12-04T10:11:57.6203143Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6203832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6203906Z if out == self.unknown_value: 2025-12-04T10:11:57.6204037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6204127Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6204250Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6204593Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6204651Z graph_break [] 2025-12-04T10:11:57.6204779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6204933Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6205054Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6205396Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6205488Z graph_break [] 2025-12-04T10:11:57.6205985Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml - 2025-12-04T10:11:57.6206087Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6207380Z FAILED [0.4135s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6207391Z 2025-12-04T10:11:57.6207516Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6208038Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6208042Z 2025-12-04T10:11:57.6208204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6208305Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6208426Z ================== 1 failed, 57 deselected, 2 rerun in 11.82s ================== 2025-12-04T10:11:57.6208487Z Got exit code 1 2025-12-04T10:11:57.6208554Z Retrying single test... 2025-12-04T10:11:57.6208820Z W1204 09:42:49.902000 45823 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6209208Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml 2025-12-04T10:11:57.6209343Z ============================= test session starts ============================== 2025-12-04T10:11:57.6209558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6209625Z cachedir: .pytest_cache 2025-12-04T10:11:57.6209940Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6210017Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6210082Z configfile: pytest.ini 2025-12-04T10:11:57.6210406Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6210535Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6211110Z stepcurrent: skipping 20 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6211184Z Running 1 items in this shard 2025-12-04T10:11:57.6211187Z 2025-12-04T10:11:57.6211914Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:51.957457567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6211926Z 2025-12-04T10:11:57.6212289Z [W1204 09:43:00.086688388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6212293Z 2025-12-04T10:11:57.6212583Z [W1204 09:43:00.086952732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6212587Z 2025-12-04T10:11:57.6212930Z [W1204 09:43:00.092777102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6212935Z 2025-12-04T10:11:57.6213221Z [W1204 09:43:00.093358401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6213224Z 2025-12-04T10:11:57.6213525Z [W1204 09:43:00.093540615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6213528Z 2025-12-04T10:11:57.6213816Z [W1204 09:43:00.098864275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6213822Z 2025-12-04T10:11:57.6214110Z [W1204 09:43:00.099412175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6214114Z 2025-12-04T10:11:57.6214400Z [W1204 09:43:00.099586028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6214406Z 2025-12-04T10:11:57.6214492Z ('RERUN', {'yellow': True}) [10.9920s] [100%] 2025-12-04T10:11:57.6215207Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:43:01.274102924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6215211Z 2025-12-04T10:11:57.6215502Z [W1204 09:43:01.274665314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6215508Z 2025-12-04T10:11:57.6215798Z [W1204 09:43:01.274806496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6215802Z 2025-12-04T10:11:57.6216088Z [W1204 09:43:01.277739646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6216131Z 2025-12-04T10:11:57.6216421Z [W1204 09:43:01.278302666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6216425Z 2025-12-04T10:11:57.6216710Z [W1204 09:43:01.278443288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6216714Z 2025-12-04T10:11:57.6217193Z [W1204 09:43:01.282970466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6217197Z 2025-12-04T10:11:57.6217501Z [W1204 09:43:01.283430913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6217505Z 2025-12-04T10:11:57.6217796Z [W1204 09:43:01.283568816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6217803Z 2025-12-04T10:11:57.6217882Z ('RERUN', {'yellow': True}) [0.4112s] [100%] 2025-12-04T10:11:57.6218599Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:43:01.682474488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6218609Z 2025-12-04T10:11:57.6218894Z [W1204 09:43:01.683048378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6219020Z 2025-12-04T10:11:57.6219312Z [W1204 09:43:01.683189090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6219315Z 2025-12-04T10:11:57.6219604Z [W1204 09:43:01.686098840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6219655Z 2025-12-04T10:11:57.6219941Z [W1204 09:43:01.686647129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6219944Z 2025-12-04T10:11:57.6220232Z [W1204 09:43:01.686785582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6220235Z 2025-12-04T10:11:57.6220517Z [W1204 09:43:01.691297199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6220520Z 2025-12-04T10:11:57.6220812Z [W1204 09:43:01.691752176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6220815Z 2025-12-04T10:11:57.6221101Z [W1204 09:43:01.691888469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6221107Z 2025-12-04T10:11:57.6221172Z FAILED [0.4084s] [100%] 2025-12-04T10:11:57.6221175Z 2025-12-04T10:11:57.6221260Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6221555Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6221633Z Traceback (most recent call last): 2025-12-04T10:11:57.6221945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6222014Z method(*args, **kwargs) 2025-12-04T10:11:57.6222311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6222374Z method(*args, **kwargs) 2025-12-04T10:11:57.6222675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6222789Z with policy(): 2025-12-04T10:11:57.6223082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6223153Z raise RuntimeError(msg) 2025-12-04T10:11:57.6223948Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6223953Z 2025-12-04T10:11:57.6224086Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6224609Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6224616Z 2025-12-04T10:11:57.6224778Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6224911Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6225005Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6225354Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6225480Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6225539Z graph_break [] 2025-12-04T10:11:57.6226052Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6226749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6226868Z if out == self.unknown_value: 2025-12-04T10:11:57.6227168Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6227242Z Traceback (most recent call last): 2025-12-04T10:11:57.6227541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6227604Z method(*args, **kwargs) 2025-12-04T10:11:57.6227892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6227958Z method(*args, **kwargs) 2025-12-04T10:11:57.6228246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6228310Z with policy(): 2025-12-04T10:11:57.6228601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6228670Z raise RuntimeError(msg) 2025-12-04T10:11:57.6229480Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6229484Z 2025-12-04T10:11:57.6229611Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6230138Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6230142Z 2025-12-04T10:11:57.6230299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6230430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6230573Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6230924Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6231052Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6231111Z graph_break [] 2025-12-04T10:11:57.6231235Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6231939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6232008Z if out == self.unknown_value: 2025-12-04T10:11:57.6232135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6232228Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6232353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6232697Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6232754Z graph_break [] 2025-12-04T10:11:57.6232838Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6233130Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6233278Z Traceback (most recent call last): 2025-12-04T10:11:57.6233580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6233644Z method(*args, **kwargs) 2025-12-04T10:11:57.6233962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6234031Z method(*args, **kwargs) 2025-12-04T10:11:57.6234317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6234380Z with policy(): 2025-12-04T10:11:57.6234668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6234733Z raise RuntimeError(msg) 2025-12-04T10:11:57.6235547Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6235551Z 2025-12-04T10:11:57.6235676Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6236197Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6236200Z 2025-12-04T10:11:57.6236353Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6236477Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6236573Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6236915Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6237043Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6237100Z graph_break [] 2025-12-04T10:11:57.6237221Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6237955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6238022Z if out == self.unknown_value: 2025-12-04T10:11:57.6238150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6238239Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6238360Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6238706Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6238764Z graph_break [] 2025-12-04T10:11:57.6238886Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6238983Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6239106Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6239449Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6239517Z graph_break [] 2025-12-04T10:11:57.6240050Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml - 2025-12-04T10:11:57.6240243Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6241546Z FAILED [0.4084s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6241586Z 2025-12-04T10:11:57.6241715Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6242234Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6242238Z 2025-12-04T10:11:57.6242399Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6242505Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6242621Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ================== 2025-12-04T10:11:57.6242694Z Got exit code 1 2025-12-04T10:11:57.6243168Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6243421Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6243694Z W1204 09:43:08.326000 46009 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6244084Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml 2025-12-04T10:11:57.6244186Z ============================= test session starts ============================== 2025-12-04T10:11:57.6244396Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6244481Z cachedir: .pytest_cache 2025-12-04T10:11:57.6244789Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6244904Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6244973Z configfile: pytest.ini 2025-12-04T10:11:57.6245288Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6245416Z collecting ... collected 58 items / 21 deselected / 37 selected 2025-12-04T10:11:57.6245511Z stepcurrent: skipping 21 already run items. 2025-12-04T10:11:57.6245579Z Running 37 items in this shard 2025-12-04T10:11:57.6245583Z 2025-12-04T10:11:57.6246091Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9485s] [ 2%] 2025-12-04T10:11:57.6246579Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5423s] [ 2%] 2025-12-04T10:11:57.6247027Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5353s] [ 2%] 2025-12-04T10:11:57.6247031Z 2025-12-04T10:11:57.6247114Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6247401Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6247480Z Traceback (most recent call last): 2025-12-04T10:11:57.6247851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6247916Z method(*args, **kwargs) 2025-12-04T10:11:57.6248209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6248308Z method(*args, **kwargs) 2025-12-04T10:11:57.6248596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6248656Z with policy(): 2025-12-04T10:11:57.6248947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6249017Z raise RuntimeError(msg) 2025-12-04T10:11:57.6249810Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6249814Z 2025-12-04T10:11:57.6249943Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6250464Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6250470Z 2025-12-04T10:11:57.6250629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6250753Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6250848Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6251399Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6251532Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6251590Z graph_break [] 2025-12-04T10:11:57.6251901Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6252021Z Traceback (most recent call last): 2025-12-04T10:11:57.6252335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6252405Z method(*args, **kwargs) 2025-12-04T10:11:57.6252698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6252767Z method(*args, **kwargs) 2025-12-04T10:11:57.6253054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6253121Z with policy(): 2025-12-04T10:11:57.6253419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6253486Z raise RuntimeError(msg) 2025-12-04T10:11:57.6254294Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6254301Z 2025-12-04T10:11:57.6254434Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6254967Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6254970Z 2025-12-04T10:11:57.6255198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6255333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6255437Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6255981Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6256150Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6256209Z graph_break [] 2025-12-04T10:11:57.6256336Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6256431Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6256554Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6257098Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6257158Z graph_break [] 2025-12-04T10:11:57.6257246Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6257547Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6257624Z Traceback (most recent call last): 2025-12-04T10:11:57.6257922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6257991Z method(*args, **kwargs) 2025-12-04T10:11:57.6258280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6258351Z method(*args, **kwargs) 2025-12-04T10:11:57.6258638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6258699Z with policy(): 2025-12-04T10:11:57.6258993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6259098Z raise RuntimeError(msg) 2025-12-04T10:11:57.6259906Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6259911Z 2025-12-04T10:11:57.6260040Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6260563Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6260567Z 2025-12-04T10:11:57.6260732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6260861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6260960Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6261495Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6261623Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6261685Z graph_break [] 2025-12-04T10:11:57.6261808Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6261903Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6262086Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6262621Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6262723Z graph_break [] 2025-12-04T10:11:57.6262850Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6262937Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6263062Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6263597Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6263661Z graph_break [] 2025-12-04T10:11:57.6264157Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml - 2025-12-04T10:11:57.6264273Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6265564Z FAILED [0.5353s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6265568Z 2025-12-04T10:11:57.6265697Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6266229Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6266233Z 2025-12-04T10:11:57.6266390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6266536Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6266653Z ================== 1 failed, 21 deselected, 2 rerun in 3.05s =================== 2025-12-04T10:11:57.6266714Z Got exit code 1 2025-12-04T10:11:57.6266783Z Retrying single test... 2025-12-04T10:11:57.6267044Z W1204 09:43:18.011000 46191 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6267440Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml 2025-12-04T10:11:57.6267538Z ============================= test session starts ============================== 2025-12-04T10:11:57.6267747Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6267817Z cachedir: .pytest_cache 2025-12-04T10:11:57.6268120Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6268208Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6268274Z configfile: pytest.ini 2025-12-04T10:11:57.6268593Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6268727Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6269292Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6269446Z Running 1 items in this shard 2025-12-04T10:11:57.6269455Z 2025-12-04T10:11:57.6270182Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:19.611299086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6270224Z 2025-12-04T10:11:57.6270524Z [W1204 09:43:28.666942859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6270528Z 2025-12-04T10:11:57.6270820Z [W1204 09:43:28.667200783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6270823Z 2025-12-04T10:11:57.6271109Z [W1204 09:43:28.673340288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6271112Z 2025-12-04T10:11:57.6271405Z [W1204 09:43:28.673991619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6271408Z 2025-12-04T10:11:57.6271691Z [W1204 09:43:28.674171112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6271697Z 2025-12-04T10:11:57.6271985Z [W1204 09:43:28.679777108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6271988Z 2025-12-04T10:11:57.6272273Z [W1204 09:43:28.680356108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6272276Z 2025-12-04T10:11:57.6272564Z [W1204 09:43:28.680529461 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6272568Z 2025-12-04T10:11:57.6272654Z ('RERUN', {'yellow': True}) [11.0061s] [100%] 2025-12-04T10:11:57.6273371Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:29.475063069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6273414Z 2025-12-04T10:11:57.6273702Z [W1204 09:43:29.475604259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6273705Z 2025-12-04T10:11:57.6273988Z [W1204 09:43:29.475741531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6273991Z 2025-12-04T10:11:57.6274279Z [W1204 09:43:29.478604750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6274282Z 2025-12-04T10:11:57.6274573Z [W1204 09:43:29.479048837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6274576Z 2025-12-04T10:11:57.6274867Z [W1204 09:43:29.479185610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6274872Z 2025-12-04T10:11:57.6275161Z [W1204 09:43:29.483671307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6275164Z 2025-12-04T10:11:57.6275454Z [W1204 09:43:29.484138415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6275457Z 2025-12-04T10:11:57.6275755Z [W1204 09:43:29.484275387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6275759Z 2025-12-04T10:11:57.6275911Z ('RERUN', {'yellow': True}) [0.4949s] [100%] 2025-12-04T10:11:57.6276630Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:30.969627815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6276668Z 2025-12-04T10:11:57.6276955Z [W1204 09:43:30.970201705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6276962Z 2025-12-04T10:11:57.6277249Z [W1204 09:43:30.970346997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6277252Z 2025-12-04T10:11:57.6277536Z [W1204 09:43:30.973252127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6277540Z 2025-12-04T10:11:57.6277832Z [W1204 09:43:30.973704675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6277836Z 2025-12-04T10:11:57.6278121Z [W1204 09:43:30.973845537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6278127Z 2025-12-04T10:11:57.6278420Z [W1204 09:43:30.978381084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6278423Z 2025-12-04T10:11:57.6278707Z [W1204 09:43:30.978840542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6278718Z 2025-12-04T10:11:57.6279003Z [W1204 09:43:30.978979215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6279007Z 2025-12-04T10:11:57.6279070Z FAILED [0.4894s] [100%] 2025-12-04T10:11:57.6279076Z 2025-12-04T10:11:57.6279167Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6279459Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6279541Z Traceback (most recent call last): 2025-12-04T10:11:57.6279934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6280002Z method(*args, **kwargs) 2025-12-04T10:11:57.6280306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6280369Z method(*args, **kwargs) 2025-12-04T10:11:57.6280658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6280721Z with policy(): 2025-12-04T10:11:57.6281019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6281089Z raise RuntimeError(msg) 2025-12-04T10:11:57.6281886Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6281893Z 2025-12-04T10:11:57.6282025Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6282546Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6282549Z 2025-12-04T10:11:57.6282712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6282915Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6283014Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6283565Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6283731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6283790Z graph_break [] 2025-12-04T10:11:57.6283918Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6284607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6284685Z if out == self.unknown_value: 2025-12-04T10:11:57.6284981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6285056Z Traceback (most recent call last): 2025-12-04T10:11:57.6285355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6285424Z method(*args, **kwargs) 2025-12-04T10:11:57.6285713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6285779Z method(*args, **kwargs) 2025-12-04T10:11:57.6286064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6286126Z with policy(): 2025-12-04T10:11:57.6286419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6286488Z raise RuntimeError(msg) 2025-12-04T10:11:57.6287294Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6287337Z 2025-12-04T10:11:57.6287477Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6287998Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6288002Z 2025-12-04T10:11:57.6288160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6288287Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6288387Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6288932Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6289065Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6289123Z graph_break [] 2025-12-04T10:11:57.6289247Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6289936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6290007Z if out == self.unknown_value: 2025-12-04T10:11:57.6290201Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6290294Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6290416Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6290956Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6291051Z graph_break [] 2025-12-04T10:11:57.6291139Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6291428Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6291503Z Traceback (most recent call last): 2025-12-04T10:11:57.6291801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6291867Z method(*args, **kwargs) 2025-12-04T10:11:57.6292155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6292222Z method(*args, **kwargs) 2025-12-04T10:11:57.6292511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6292576Z with policy(): 2025-12-04T10:11:57.6292869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6292936Z raise RuntimeError(msg) 2025-12-04T10:11:57.6293748Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6293756Z 2025-12-04T10:11:57.6293883Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6294408Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6294465Z 2025-12-04T10:11:57.6294622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6294750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6294843Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6295382Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6295514Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6295572Z graph_break [] 2025-12-04T10:11:57.6295707Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6296401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6296474Z if out == self.unknown_value: 2025-12-04T10:11:57.6296601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6296690Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6296811Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6297418Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6297483Z graph_break [] 2025-12-04T10:11:57.6297608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6297697Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6297855Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6298398Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6298457Z graph_break [] 2025-12-04T10:11:57.6298948Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml - 2025-12-04T10:11:57.6299058Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6300337Z FAILED [0.4894s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6300351Z 2025-12-04T10:11:57.6300475Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6300995Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6300999Z 2025-12-04T10:11:57.6301162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6301268Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6301387Z ================== 1 failed, 57 deselected, 2 rerun in 12.01s ================== 2025-12-04T10:11:57.6301451Z Got exit code 1 2025-12-04T10:11:57.6301552Z Retrying single test... 2025-12-04T10:11:57.6301821Z W1204 09:43:36.589000 46377 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6302211Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml 2025-12-04T10:11:57.6302306Z ============================= test session starts ============================== 2025-12-04T10:11:57.6302519Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6302586Z cachedir: .pytest_cache 2025-12-04T10:11:57.6302896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6302973Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6303040Z configfile: pytest.ini 2025-12-04T10:11:57.6303358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6303489Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6304061Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6304136Z Running 1 items in this shard 2025-12-04T10:11:57.6304140Z 2025-12-04T10:11:57.6304928Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:38.196651915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6304933Z 2025-12-04T10:11:57.6305232Z [W1204 09:43:47.181403929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6305272Z 2025-12-04T10:11:57.6305559Z [W1204 09:43:47.181675273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6305563Z 2025-12-04T10:11:57.6305850Z [W1204 09:43:47.187821698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6305854Z 2025-12-04T10:11:57.6306138Z [W1204 09:43:47.188432369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6306141Z 2025-12-04T10:11:57.6306434Z [W1204 09:43:47.188618582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6306437Z 2025-12-04T10:11:57.6306723Z [W1204 09:43:47.194156357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6306729Z 2025-12-04T10:11:57.6307017Z [W1204 09:43:47.194699746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6307021Z 2025-12-04T10:11:57.6307304Z [W1204 09:43:47.194861899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6307307Z 2025-12-04T10:11:57.6307389Z ('RERUN', {'yellow': True}) [10.9491s] [100%] 2025-12-04T10:11:57.6308110Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:48.999467135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6308113Z 2025-12-04T10:11:57.6308400Z [W1204 09:43:48.000018844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6308440Z 2025-12-04T10:11:57.6308729Z [W1204 09:43:48.000166587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6308732Z 2025-12-04T10:11:57.6309020Z [W1204 09:43:48.003081506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6309023Z 2025-12-04T10:11:57.6309310Z [W1204 09:43:48.003531734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6309313Z 2025-12-04T10:11:57.6309599Z [W1204 09:43:48.003670716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6309602Z 2025-12-04T10:11:57.6309893Z [W1204 09:43:48.008175393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6309897Z 2025-12-04T10:11:57.6310181Z [W1204 09:43:48.008639441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6310184Z 2025-12-04T10:11:57.6310477Z [W1204 09:43:48.008776924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6310480Z 2025-12-04T10:11:57.6310562Z ('RERUN', {'yellow': True}) [0.5063s] [100%] 2025-12-04T10:11:57.6311339Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:48.505779800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6311343Z 2025-12-04T10:11:57.6311636Z [W1204 09:43:48.506315949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6311673Z 2025-12-04T10:11:57.6311958Z [W1204 09:43:48.506456432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6311961Z 2025-12-04T10:11:57.6312248Z [W1204 09:43:48.509360101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6312251Z 2025-12-04T10:11:57.6312535Z [W1204 09:43:48.509810839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6312538Z 2025-12-04T10:11:57.6312828Z [W1204 09:43:48.509954611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6312831Z 2025-12-04T10:11:57.6313127Z [W1204 09:43:48.514430778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6313132Z 2025-12-04T10:11:57.6313425Z [W1204 09:43:48.514887916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6313428Z 2025-12-04T10:11:57.6313713Z [W1204 09:43:48.515024878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6313716Z 2025-12-04T10:11:57.6313778Z FAILED [0.5022s] [100%] 2025-12-04T10:11:57.6313781Z 2025-12-04T10:11:57.6313868Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6314163Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6314242Z Traceback (most recent call last): 2025-12-04T10:11:57.6314548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6314613Z method(*args, **kwargs) 2025-12-04T10:11:57.6314911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6315013Z method(*args, **kwargs) 2025-12-04T10:11:57.6315307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6315366Z with policy(): 2025-12-04T10:11:57.6315657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6315728Z raise RuntimeError(msg) 2025-12-04T10:11:57.6316527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6316532Z 2025-12-04T10:11:57.6316664Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6317491Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6317496Z 2025-12-04T10:11:57.6317670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6317807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6317903Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6318582Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6318722Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6318848Z graph_break [] 2025-12-04T10:11:57.6318981Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6319675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6319755Z if out == self.unknown_value: 2025-12-04T10:11:57.6320089Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6320167Z Traceback (most recent call last): 2025-12-04T10:11:57.6320473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6320539Z method(*args, **kwargs) 2025-12-04T10:11:57.6320826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6320897Z method(*args, **kwargs) 2025-12-04T10:11:57.6321180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6321244Z with policy(): 2025-12-04T10:11:57.6321546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6321612Z raise RuntimeError(msg) 2025-12-04T10:11:57.6322426Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6322430Z 2025-12-04T10:11:57.6322558Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6323077Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6323136Z 2025-12-04T10:11:57.6323296Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6323427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6323519Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6324062Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6324200Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6324259Z graph_break [] 2025-12-04T10:11:57.6324384Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6325074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6325154Z if out == self.unknown_value: 2025-12-04T10:11:57.6325284Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6325375Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6325499Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6326105Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6326166Z graph_break [] 2025-12-04T10:11:57.6326289Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6326586Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6326661Z Traceback (most recent call last): 2025-12-04T10:11:57.6326977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6327042Z method(*args, **kwargs) 2025-12-04T10:11:57.6327337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6327406Z method(*args, **kwargs) 2025-12-04T10:11:57.6327698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6327765Z with policy(): 2025-12-04T10:11:57.6328066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6328139Z raise RuntimeError(msg) 2025-12-04T10:11:57.6328965Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6328969Z 2025-12-04T10:11:57.6329096Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6329623Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6330239Z 2025-12-04T10:11:57.6330408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6330783Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6331299Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6332036Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6332813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6333084Z graph_break [] 2025-12-04T10:11:57.6333309Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6334226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6335059Z if out == self.unknown_value: 2025-12-04T10:11:57.6335315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6335623Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6335924Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6336666Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6337347Z graph_break [] 2025-12-04T10:11:57.6337569Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6337959Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6338252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6338995Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6339710Z graph_break [] 2025-12-04T10:11:57.6340291Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml - 2025-12-04T10:11:57.6340963Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6342444Z FAILED [0.5022s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6343795Z 2025-12-04T10:11:57.6343926Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6344661Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6345252Z 2025-12-04T10:11:57.6345417Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6345755Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6346057Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ================== 2025-12-04T10:11:57.6346329Z Got exit code 1 2025-12-04T10:11:57.6346905Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6347695Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6348331Z W1204 09:43:55.153000 46564 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6349053Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml 2025-12-04T10:11:57.6349611Z ============================= test session starts ============================== 2025-12-04T10:11:57.6349993Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6350346Z cachedir: .pytest_cache 2025-12-04T10:11:57.6350764Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6351220Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6351430Z configfile: pytest.ini 2025-12-04T10:11:57.6351859Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6352383Z collecting ... collected 58 items / 22 deselected / 36 selected 2025-12-04T10:11:57.6352672Z stepcurrent: skipping 22 already run items. 2025-12-04T10:11:57.6352899Z Running 36 items in this shard 2025-12-04T10:11:57.6353025Z 2025-12-04T10:11:57.6353535Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0162s] [ 2%] 2025-12-04T10:11:57.6354677Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6129s] [ 2%] 2025-12-04T10:11:57.6355694Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6145s] [ 2%] 2025-12-04T10:11:57.6356259Z 2025-12-04T10:11:57.6356347Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6356811Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6357258Z Traceback (most recent call last): 2025-12-04T10:11:57.6357716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6358172Z method(*args, **kwargs) 2025-12-04T10:11:57.6358586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6359025Z method(*args, **kwargs) 2025-12-04T10:11:57.6359426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6359858Z with policy(): 2025-12-04T10:11:57.6360386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6360824Z raise RuntimeError(msg) 2025-12-04T10:11:57.6361753Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6362639Z 2025-12-04T10:11:57.6362769Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6363501Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6364097Z 2025-12-04T10:11:57.6364263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6364695Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6365001Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6365526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6366084Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6366350Z graph_break [] 2025-12-04T10:11:57.6366743Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6367186Z Traceback (most recent call last): 2025-12-04T10:11:57.6367627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6368057Z method(*args, **kwargs) 2025-12-04T10:11:57.6368477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6368910Z method(*args, **kwargs) 2025-12-04T10:11:57.6369312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6369734Z with policy(): 2025-12-04T10:11:57.6370122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6370560Z raise RuntimeError(msg) 2025-12-04T10:11:57.6371591Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6372523Z 2025-12-04T10:11:57.6372652Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6373378Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6373982Z 2025-12-04T10:11:57.6374141Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6374504Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6374803Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6375326Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6375875Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6376144Z graph_break [] 2025-12-04T10:11:57.6376365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6376670Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6376962Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6377508Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6378003Z graph_break [] 2025-12-04T10:11:57.6378182Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6378649Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6379090Z Traceback (most recent call last): 2025-12-04T10:11:57.6379529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6379969Z method(*args, **kwargs) 2025-12-04T10:11:57.6380427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6380868Z method(*args, **kwargs) 2025-12-04T10:11:57.6381272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6381702Z with policy(): 2025-12-04T10:11:57.6382100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6382544Z raise RuntimeError(msg) 2025-12-04T10:11:57.6383487Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6384380Z 2025-12-04T10:11:57.6384513Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6385238Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6385839Z 2025-12-04T10:11:57.6385996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6386358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6386656Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6387242Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6387801Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6388071Z graph_break [] 2025-12-04T10:11:57.6388321Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6388622Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6388912Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6389462Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6389940Z graph_break [] 2025-12-04T10:11:57.6390154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6390452Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6390739Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6391279Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6391764Z graph_break [] 2025-12-04T10:11:57.6392339Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml - 2025-12-04T10:11:57.6392997Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6394485Z FAILED [0.6145s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6395856Z 2025-12-04T10:11:57.6395984Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6396714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6397358Z 2025-12-04T10:11:57.6397516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6397857Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6398157Z ================== 1 failed, 22 deselected, 2 rerun in 3.27s =================== 2025-12-04T10:11:57.6398422Z Got exit code 1 2025-12-04T10:11:57.6398586Z Retrying single test... 2025-12-04T10:11:57.6398964Z W1204 09:44:04.996000 46753 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6399693Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml 2025-12-04T10:11:57.6400295Z ============================= test session starts ============================== 2025-12-04T10:11:57.6400682Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6401029Z cachedir: .pytest_cache 2025-12-04T10:11:57.6401457Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6401911Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6402122Z configfile: pytest.ini 2025-12-04T10:11:57.6402547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6403136Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6403918Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6404670Z Running 1 items in this shard 2025-12-04T10:11:57.6404804Z 2025-12-04T10:11:57.6405536Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:06.226747132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6406350Z 2025-12-04T10:11:57.6406666Z [W1204 09:44:15.367819867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6407050Z 2025-12-04T10:11:57.6407348Z [W1204 09:44:15.368078722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6407723Z 2025-12-04T10:11:57.6408018Z [W1204 09:44:15.374006353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6408393Z 2025-12-04T10:11:57.6408688Z [W1204 09:44:15.374596463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6409061Z 2025-12-04T10:11:57.6409351Z [W1204 09:44:15.374765436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6409727Z 2025-12-04T10:11:57.6410017Z [W1204 09:44:15.380166038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6410393Z 2025-12-04T10:11:57.6410686Z [W1204 09:44:15.380708648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6411056Z 2025-12-04T10:11:57.6411353Z [W1204 09:44:15.380870440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6411764Z 2025-12-04T10:11:57.6411856Z ('RERUN', {'yellow': True}) [11.1794s] [100%] 2025-12-04T10:11:57.6412753Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:16.731716380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6413560Z 2025-12-04T10:11:57.6413855Z [W1204 09:44:16.732239558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6414236Z 2025-12-04T10:11:57.6414534Z [W1204 09:44:16.732388701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6414903Z 2025-12-04T10:11:57.6415197Z [W1204 09:44:16.735272500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6415568Z 2025-12-04T10:11:57.6415862Z [W1204 09:44:16.735823010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6416237Z 2025-12-04T10:11:57.6416526Z [W1204 09:44:16.735963042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6416899Z 2025-12-04T10:11:57.6417436Z [W1204 09:44:16.740510320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6417813Z 2025-12-04T10:11:57.6418235Z [W1204 09:44:16.740966347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6418628Z 2025-12-04T10:11:57.6418920Z [W1204 09:44:16.741107650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6419347Z 2025-12-04T10:11:57.6419436Z ('RERUN', {'yellow': True}) [0.5915s] [100%] 2025-12-04T10:11:57.6420318Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:17.321714799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6421123Z 2025-12-04T10:11:57.6421421Z [W1204 09:44:17.322235648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6421799Z 2025-12-04T10:11:57.6422096Z [W1204 09:44:17.322375030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6422473Z 2025-12-04T10:11:57.6422765Z [W1204 09:44:17.325284540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6423139Z 2025-12-04T10:11:57.6423435Z [W1204 09:44:17.325827559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6423803Z 2025-12-04T10:11:57.6424098Z [W1204 09:44:17.325966372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6424464Z 2025-12-04T10:11:57.6424758Z [W1204 09:44:17.330438319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6425129Z 2025-12-04T10:11:57.6425420Z [W1204 09:44:17.330894746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6425794Z 2025-12-04T10:11:57.6426088Z [W1204 09:44:17.331033159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6426460Z 2025-12-04T10:11:57.6426583Z FAILED [0.5922s] [100%] 2025-12-04T10:11:57.6426690Z 2025-12-04T10:11:57.6426779Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6427252Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6427714Z Traceback (most recent call last): 2025-12-04T10:11:57.6428175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6428618Z method(*args, **kwargs) 2025-12-04T10:11:57.6429035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6429484Z method(*args, **kwargs) 2025-12-04T10:11:57.6429895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6430323Z with policy(): 2025-12-04T10:11:57.6430721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6431167Z raise RuntimeError(msg) 2025-12-04T10:11:57.6432093Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6432982Z 2025-12-04T10:11:57.6433115Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6433944Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6434553Z 2025-12-04T10:11:57.6434748Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6435123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6435428Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6435949Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6436515Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6436785Z graph_break [] 2025-12-04T10:11:57.6437001Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6437916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6438759Z if out == self.unknown_value: 2025-12-04T10:11:57.6439179Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6439632Z Traceback (most recent call last): 2025-12-04T10:11:57.6440121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6440564Z method(*args, **kwargs) 2025-12-04T10:11:57.6440967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6441422Z method(*args, **kwargs) 2025-12-04T10:11:57.6441829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6442257Z with policy(): 2025-12-04T10:11:57.6442643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6443127Z raise RuntimeError(msg) 2025-12-04T10:11:57.6444092Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6444989Z 2025-12-04T10:11:57.6445119Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6445846Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6446450Z 2025-12-04T10:11:57.6446610Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6446976Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6447280Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6447797Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6448350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6448620Z graph_break [] 2025-12-04T10:11:57.6448832Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6449820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6450653Z if out == self.unknown_value: 2025-12-04T10:11:57.6450903Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6451238Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6451534Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6452082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6452568Z graph_break [] 2025-12-04T10:11:57.6452742Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6453218Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6453667Z Traceback (most recent call last): 2025-12-04T10:11:57.6454115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6454552Z method(*args, **kwargs) 2025-12-04T10:11:57.6454960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6455398Z method(*args, **kwargs) 2025-12-04T10:11:57.6455801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6456231Z with policy(): 2025-12-04T10:11:57.6456619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6457060Z raise RuntimeError(msg) 2025-12-04T10:11:57.6458005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6458909Z 2025-12-04T10:11:57.6459035Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6459806Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6460403Z 2025-12-04T10:11:57.6460566Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6460924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6461228Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6461762Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6462310Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6462571Z graph_break [] 2025-12-04T10:11:57.6462787Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6463686Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6464522Z if out == self.unknown_value: 2025-12-04T10:11:57.6464767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6465076Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6465491Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6466195Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6466865Z graph_break [] 2025-12-04T10:11:57.6467173Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6467601Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6468047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6468717Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6469252Z graph_break [] 2025-12-04T10:11:57.6469958Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml - 2025-12-04T10:11:57.6470739Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6472334Z FAILED [0.5922s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6473722Z 2025-12-04T10:11:57.6473989Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6474808Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6475435Z 2025-12-04T10:11:57.6475623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6476131Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6476528Z ================== 1 failed, 57 deselected, 2 rerun in 12.39s ================== 2025-12-04T10:11:57.6476857Z Got exit code 1 2025-12-04T10:11:57.6477154Z Retrying single test... 2025-12-04T10:11:57.6477663Z W1204 09:44:23.934000 46947 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6478481Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml 2025-12-04T10:11:57.6479159Z ============================= test session starts ============================== 2025-12-04T10:11:57.6479637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6480139Z cachedir: .pytest_cache 2025-12-04T10:11:57.6480711Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6481228Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6481543Z configfile: pytest.ini 2025-12-04T10:11:57.6482118Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6482744Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6483565Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6484441Z Running 1 items in this shard 2025-12-04T10:11:57.6484599Z 2025-12-04T10:11:57.6485492Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:25.182186073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6486333Z 2025-12-04T10:11:57.6486737Z [W1204 09:44:34.267724860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6487198Z 2025-12-04T10:11:57.6487574Z [W1204 09:44:34.267991195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6487978Z 2025-12-04T10:11:57.6488302Z [W1204 09:44:34.273918746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6488722Z 2025-12-04T10:11:57.6489099Z [W1204 09:44:34.274515536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6489565Z 2025-12-04T10:11:57.6489890Z [W1204 09:44:34.274692419 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6490320Z 2025-12-04T10:11:57.6490642Z [W1204 09:44:34.280217624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6491029Z 2025-12-04T10:11:57.6491463Z [W1204 09:44:34.280753093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6491866Z 2025-12-04T10:11:57.6492220Z [W1204 09:44:34.280918636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6492619Z 2025-12-04T10:11:57.6492732Z ('RERUN', {'yellow': True}) [11.1387s] [100%] 2025-12-04T10:11:57.6493787Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:35.635248313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6494653Z 2025-12-04T10:11:57.6494977Z [W1204 09:44:35.635773042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6495382Z 2025-12-04T10:11:57.6495776Z [W1204 09:44:35.635910984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6496246Z 2025-12-04T10:11:57.6496612Z [W1204 09:44:35.638895245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6497016Z 2025-12-04T10:11:57.6497337Z [W1204 09:44:35.639461015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6497773Z 2025-12-04T10:11:57.6498083Z [W1204 09:44:35.639600088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6498611Z 2025-12-04T10:11:57.6498936Z [W1204 09:44:35.644200817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6499368Z 2025-12-04T10:11:57.6499690Z [W1204 09:44:35.644679575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6500093Z 2025-12-04T10:11:57.6500494Z [W1204 09:44:35.644816158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6500908Z 2025-12-04T10:11:57.6501050Z ('RERUN', {'yellow': True}) [0.5994s] [100%] 2025-12-04T10:11:57.6502078Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:36.233581086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6502935Z 2025-12-04T10:11:57.6503303Z [W1204 09:44:36.234118305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6503752Z 2025-12-04T10:11:57.6504111Z [W1204 09:44:36.234260498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6504528Z 2025-12-04T10:11:57.6504883Z [W1204 09:44:36.237235039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6505280Z 2025-12-04T10:11:57.6505701Z [W1204 09:44:36.237793578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6506103Z 2025-12-04T10:11:57.6506427Z [W1204 09:44:36.237931541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6506873Z 2025-12-04T10:11:57.6507191Z [W1204 09:44:36.242523190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6507651Z 2025-12-04T10:11:57.6507986Z [W1204 09:44:36.242988228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6508433Z 2025-12-04T10:11:57.6508755Z [W1204 09:44:36.243123640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6509156Z 2025-12-04T10:11:57.6509272Z FAILED [0.6007s] [100%] 2025-12-04T10:11:57.6509449Z 2025-12-04T10:11:57.6509581Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6510170Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6510698Z Traceback (most recent call last): 2025-12-04T10:11:57.6511321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6511827Z method(*args, **kwargs) 2025-12-04T10:11:57.6512315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6512967Z method(*args, **kwargs) 2025-12-04T10:11:57.6513476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6513954Z with policy(): 2025-12-04T10:11:57.6514522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6515061Z raise RuntimeError(msg) 2025-12-04T10:11:57.6516120Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6517212Z 2025-12-04T10:11:57.6517386Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6518224Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6518872Z 2025-12-04T10:11:57.6519137Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6519623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6520035Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6520819Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6521507Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6521883Z graph_break [] 2025-12-04T10:11:57.6522207Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6523301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6524247Z if out == self.unknown_value: 2025-12-04T10:11:57.6524820Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6525338Z Traceback (most recent call last): 2025-12-04T10:11:57.6525895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6526498Z method(*args, **kwargs) 2025-12-04T10:11:57.6527012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6527506Z method(*args, **kwargs) 2025-12-04T10:11:57.6528057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6528596Z with policy(): 2025-12-04T10:11:57.6529083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6529651Z raise RuntimeError(msg) 2025-12-04T10:11:57.6530711Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6531672Z 2025-12-04T10:11:57.6531818Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6532714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6533413Z 2025-12-04T10:11:57.6533648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6534064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6534536Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6535147Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6535821Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6536180Z graph_break [] 2025-12-04T10:11:57.6536488Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6537509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6538440Z if out == self.unknown_value: 2025-12-04T10:11:57.6538790Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6539226Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6539636Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6540250Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6540869Z graph_break [] 2025-12-04T10:11:57.6541222Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6541800Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6542378Z Traceback (most recent call last): 2025-12-04T10:11:57.6542932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6543481Z method(*args, **kwargs) 2025-12-04T10:11:57.6544029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6544519Z method(*args, **kwargs) 2025-12-04T10:11:57.6545037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6545616Z with policy(): 2025-12-04T10:11:57.6546075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6546626Z raise RuntimeError(msg) 2025-12-04T10:11:57.6547762Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6548875Z 2025-12-04T10:11:57.6549121Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6550038Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6550714Z 2025-12-04T10:11:57.6550921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6551408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6551785Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6552481Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6553139Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6553483Z graph_break [] 2025-12-04T10:11:57.6553886Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6554884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6555764Z if out == self.unknown_value: 2025-12-04T10:11:57.6556197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6556587Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6557005Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6557681Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6558367Z graph_break [] 2025-12-04T10:11:57.6558780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6559219Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6559624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6560363Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6560948Z graph_break [] 2025-12-04T10:11:57.6561670Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml - 2025-12-04T10:11:57.6562482Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6564134Z FAILED [0.6007s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6565580Z 2025-12-04T10:11:57.6565738Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6566618Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6567247Z 2025-12-04T10:11:57.6567485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6567902Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6568340Z ================== 1 failed, 57 deselected, 2 rerun in 12.36s ================== 2025-12-04T10:11:57.6568703Z Got exit code 1 2025-12-04T10:11:57.6569332Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6570272Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6570972Z W1204 09:44:42.903000 47141 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6571779Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml 2025-12-04T10:11:57.6572492Z ============================= test session starts ============================== 2025-12-04T10:11:57.6573003Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6573439Z cachedir: .pytest_cache 2025-12-04T10:11:57.6574052Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6574611Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6574872Z configfile: pytest.ini 2025-12-04T10:11:57.6575470Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6576084Z collecting ... collected 58 items / 23 deselected / 35 selected 2025-12-04T10:11:57.6576438Z stepcurrent: skipping 23 already run items. 2025-12-04T10:11:57.6576920Z Running 35 items in this shard 2025-12-04T10:11:57.6577078Z 2025-12-04T10:11:57.6577647Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8470s] [ 2%] 2025-12-04T10:11:57.6578829Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4487s] [ 2%] 2025-12-04T10:11:57.6579960Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4446s] [ 2%] 2025-12-04T10:11:57.6580515Z 2025-12-04T10:11:57.6580706Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6581316Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6581867Z Traceback (most recent call last): 2025-12-04T10:11:57.6582430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6583006Z method(*args, **kwargs) 2025-12-04T10:11:57.6583527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6584074Z method(*args, **kwargs) 2025-12-04T10:11:57.6584644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6585185Z with policy(): 2025-12-04T10:11:57.6585688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6586276Z raise RuntimeError(msg) 2025-12-04T10:11:57.6587274Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6588239Z 2025-12-04T10:11:57.6588400Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6589265Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6589921Z 2025-12-04T10:11:57.6590129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6590560Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6595631Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6596234Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6596802Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6597158Z graph_break [] 2025-12-04T10:11:57.6597562Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6598016Z Traceback (most recent call last): 2025-12-04T10:11:57.6598490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6598944Z method(*args, **kwargs) 2025-12-04T10:11:57.6599360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6599804Z method(*args, **kwargs) 2025-12-04T10:11:57.6600305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6600752Z with policy(): 2025-12-04T10:11:57.6601163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6601614Z raise RuntimeError(msg) 2025-12-04T10:11:57.6602558Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6603441Z 2025-12-04T10:11:57.6603614Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6604435Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6605041Z 2025-12-04T10:11:57.6605204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6605631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6605946Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6606471Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6607024Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6607295Z graph_break [] 2025-12-04T10:11:57.6607515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6607821Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6608115Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6608659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6609134Z graph_break [] 2025-12-04T10:11:57.6609311Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6609781Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6610233Z Traceback (most recent call last): 2025-12-04T10:11:57.6610689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6611150Z method(*args, **kwargs) 2025-12-04T10:11:57.6611571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6612004Z method(*args, **kwargs) 2025-12-04T10:11:57.6612408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6612840Z with policy(): 2025-12-04T10:11:57.6613234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6613722Z raise RuntimeError(msg) 2025-12-04T10:11:57.6614653Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6615533Z 2025-12-04T10:11:57.6615669Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6616396Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6617203Z 2025-12-04T10:11:57.6617380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6617759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6618064Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6618574Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6619120Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6619382Z graph_break [] 2025-12-04T10:11:57.6619597Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6619893Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6620328Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6620880Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6621403Z graph_break [] 2025-12-04T10:11:57.6621612Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6621907Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6622195Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6622732Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6623205Z graph_break [] 2025-12-04T10:11:57.6623789Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml - 2025-12-04T10:11:57.6624459Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6625929Z FAILED [0.4446s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6627280Z 2025-12-04T10:11:57.6627409Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6628138Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6628739Z 2025-12-04T10:11:57.6628899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6629242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6629539Z ================== 1 failed, 23 deselected, 2 rerun in 2.76s =================== 2025-12-04T10:11:57.6629852Z Got exit code 1 2025-12-04T10:11:57.6630025Z Retrying single test... 2025-12-04T10:11:57.6630399Z W1204 09:44:52.633000 47329 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6631118Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml 2025-12-04T10:11:57.6631223Z ============================= test session starts ============================== 2025-12-04T10:11:57.6631437Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6631504Z cachedir: .pytest_cache 2025-12-04T10:11:57.6631814Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6631896Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6631970Z configfile: pytest.ini 2025-12-04T10:11:57.6632288Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6632419Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6632991Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6633065Z Running 1 items in this shard 2025-12-04T10:11:57.6633069Z 2025-12-04T10:11:57.6633866Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:44:53.685392677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6633902Z 2025-12-04T10:11:57.6634207Z [W1204 09:45:02.645211950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6634210Z 2025-12-04T10:11:57.6634508Z [W1204 09:45:02.645474164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6634511Z 2025-12-04T10:11:57.6634797Z [W1204 09:45:02.651338124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6634800Z 2025-12-04T10:11:57.6635103Z [W1204 09:45:02.651914024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6635110Z 2025-12-04T10:11:57.6635398Z [W1204 09:45:02.652082247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6635402Z 2025-12-04T10:11:57.6635691Z [W1204 09:45:02.657506160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6635696Z 2025-12-04T10:11:57.6635989Z [W1204 09:45:02.658033379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6635993Z 2025-12-04T10:11:57.6636278Z [W1204 09:45:02.658188591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6636282Z 2025-12-04T10:11:57.6636367Z ('RERUN', {'yellow': True}) [10.8188s] [100%] 2025-12-04T10:11:57.6637091Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:03.834596647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6637096Z 2025-12-04T10:11:57.6637391Z [W1204 09:45:03.835130266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6637512Z 2025-12-04T10:11:57.6637796Z [W1204 09:45:03.835274478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6637800Z 2025-12-04T10:11:57.6638091Z [W1204 09:45:03.838197638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6638095Z 2025-12-04T10:11:57.6638383Z [W1204 09:45:03.838765848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6638387Z 2025-12-04T10:11:57.6638672Z [W1204 09:45:03.838907320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6638679Z 2025-12-04T10:11:57.6638976Z [W1204 09:45:03.843463178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6638982Z 2025-12-04T10:11:57.6639271Z [W1204 09:45:03.843920596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6639274Z 2025-12-04T10:11:57.6639563Z [W1204 09:45:03.844057148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6639566Z 2025-12-04T10:11:57.6639646Z ('RERUN', {'yellow': True}) [0.4201s] [100%] 2025-12-04T10:11:57.6640483Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:04.253293728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6640488Z 2025-12-04T10:11:57.6640824Z [W1204 09:45:04.253854578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6640829Z 2025-12-04T10:11:57.6641131Z [W1204 09:45:04.254001940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6641134Z 2025-12-04T10:11:57.6641423Z [W1204 09:45:04.256955051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6641426Z 2025-12-04T10:11:57.6641721Z [W1204 09:45:04.257516140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6641730Z 2025-12-04T10:11:57.6642021Z [W1204 09:45:04.257657612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6642024Z 2025-12-04T10:11:57.6642311Z [W1204 09:45:04.262205281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6642317Z 2025-12-04T10:11:57.6642607Z [W1204 09:45:04.262668038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6642611Z 2025-12-04T10:11:57.6642905Z [W1204 09:45:04.262804811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6642908Z 2025-12-04T10:11:57.6642975Z FAILED [0.4161s] [100%] 2025-12-04T10:11:57.6642979Z 2025-12-04T10:11:57.6643069Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6643459Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6643595Z Traceback (most recent call last): 2025-12-04T10:11:57.6643952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6644094Z method(*args, **kwargs) 2025-12-04T10:11:57.6644396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6644460Z method(*args, **kwargs) 2025-12-04T10:11:57.6644756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6644820Z with policy(): 2025-12-04T10:11:57.6645122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6645191Z raise RuntimeError(msg) 2025-12-04T10:11:57.6645996Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6646004Z 2025-12-04T10:11:57.6646144Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6646666Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6646671Z 2025-12-04T10:11:57.6646848Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6646984Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6647151Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6647513Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6647643Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6647745Z graph_break [] 2025-12-04T10:11:57.6647870Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6648576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6648656Z if out == self.unknown_value: 2025-12-04T10:11:57.6648952Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6649035Z Traceback (most recent call last): 2025-12-04T10:11:57.6649341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6649407Z method(*args, **kwargs) 2025-12-04T10:11:57.6649702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6649770Z method(*args, **kwargs) 2025-12-04T10:11:57.6650059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6650124Z with policy(): 2025-12-04T10:11:57.6650432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6650503Z raise RuntimeError(msg) 2025-12-04T10:11:57.6651320Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6651324Z 2025-12-04T10:11:57.6651458Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6652016Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6652020Z 2025-12-04T10:11:57.6652180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6652312Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6652409Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6652767Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6652895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6652956Z graph_break [] 2025-12-04T10:11:57.6653084Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6653779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6653849Z if out == self.unknown_value: 2025-12-04T10:11:57.6653977Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6654068Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6654195Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6654632Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6654696Z graph_break [] 2025-12-04T10:11:57.6654786Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6655115Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6655196Z Traceback (most recent call last): 2025-12-04T10:11:57.6655497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6655561Z method(*args, **kwargs) 2025-12-04T10:11:57.6655855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6655920Z method(*args, **kwargs) 2025-12-04T10:11:57.6656214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6656280Z with policy(): 2025-12-04T10:11:57.6656573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6656643Z raise RuntimeError(msg) 2025-12-04T10:11:57.6657463Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6657467Z 2025-12-04T10:11:57.6657595Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6658126Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6658132Z 2025-12-04T10:11:57.6658291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6658422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6658516Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6658899Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6659029Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6659089Z graph_break [] 2025-12-04T10:11:57.6659216Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6659902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6659972Z if out == self.unknown_value: 2025-12-04T10:11:57.6660097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6660186Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6660313Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6660654Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6660723Z graph_break [] 2025-12-04T10:11:57.6660854Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6660943Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6661064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6661477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6661538Z graph_break [] 2025-12-04T10:11:57.6662031Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml - 2025-12-04T10:11:57.6662168Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6663462Z FAILED [0.4161s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6663466Z 2025-12-04T10:11:57.6663595Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6664111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6664122Z 2025-12-04T10:11:57.6664277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6664381Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6664502Z ================== 1 failed, 57 deselected, 2 rerun in 11.68s ================== 2025-12-04T10:11:57.6664560Z Got exit code 1 2025-12-04T10:11:57.6664625Z Retrying single test... 2025-12-04T10:11:57.6664897Z W1204 09:45:10.881000 47522 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6665286Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml 2025-12-04T10:11:57.6665386Z ============================= test session starts ============================== 2025-12-04T10:11:57.6665597Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6665702Z cachedir: .pytest_cache 2025-12-04T10:11:57.6666015Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6666093Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6666160Z configfile: pytest.ini 2025-12-04T10:11:57.6666478Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6666607Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6667188Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6667260Z Running 1 items in this shard 2025-12-04T10:11:57.6667264Z 2025-12-04T10:11:57.6667989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:11.939141588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6668001Z 2025-12-04T10:11:57.6668304Z [W1204 09:45:21.100585283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6668308Z 2025-12-04T10:11:57.6668600Z [W1204 09:45:21.100842757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6668607Z 2025-12-04T10:11:57.6668973Z [W1204 09:45:21.106475104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6668978Z 2025-12-04T10:11:57.6669268Z [W1204 09:45:21.107027143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6669306Z 2025-12-04T10:11:57.6669599Z [W1204 09:45:21.107192296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6669603Z 2025-12-04T10:11:57.6669890Z [W1204 09:45:21.112576928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6669893Z 2025-12-04T10:11:57.6670183Z [W1204 09:45:21.113090207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6670187Z 2025-12-04T10:11:57.6670475Z [W1204 09:45:21.113254890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6670478Z 2025-12-04T10:11:57.6670563Z ('RERUN', {'yellow': True}) [11.0205s] [100%] 2025-12-04T10:11:57.6671283Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:22.281394816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6671290Z 2025-12-04T10:11:57.6671578Z [W1204 09:45:22.281937095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6671585Z 2025-12-04T10:11:57.6671871Z [W1204 09:45:22.282085178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6671875Z 2025-12-04T10:11:57.6672164Z [W1204 09:45:22.284985577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6672168Z 2025-12-04T10:11:57.6672456Z [W1204 09:45:22.285541866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6672496Z 2025-12-04T10:11:57.6672785Z [W1204 09:45:22.285679179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6672788Z 2025-12-04T10:11:57.6673076Z [W1204 09:45:22.290204746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6673080Z 2025-12-04T10:11:57.6673365Z [W1204 09:45:22.290673254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6673368Z 2025-12-04T10:11:57.6673665Z [W1204 09:45:22.290809187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6673668Z 2025-12-04T10:11:57.6673747Z ('RERUN', {'yellow': True}) [0.4057s] [100%] 2025-12-04T10:11:57.6674466Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:22.686033982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6674472Z 2025-12-04T10:11:57.6674760Z [W1204 09:45:22.686566471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6674763Z 2025-12-04T10:11:57.6675055Z [W1204 09:45:22.686711124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6675062Z 2025-12-04T10:11:57.6675415Z [W1204 09:45:22.689602714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6675419Z 2025-12-04T10:11:57.6675705Z [W1204 09:45:22.690175834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6675743Z 2025-12-04T10:11:57.6676037Z [W1204 09:45:22.690324746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6676040Z 2025-12-04T10:11:57.6676326Z [W1204 09:45:22.694798733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6676330Z 2025-12-04T10:11:57.6676625Z [W1204 09:45:22.695256360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6676628Z 2025-12-04T10:11:57.6676917Z [W1204 09:45:22.695392752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6676920Z 2025-12-04T10:11:57.6676988Z FAILED [0.4025s] [100%] 2025-12-04T10:11:57.6676992Z 2025-12-04T10:11:57.6677078Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6677375Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6677465Z Traceback (most recent call last): 2025-12-04T10:11:57.6677778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6677848Z method(*args, **kwargs) 2025-12-04T10:11:57.6678148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6678211Z method(*args, **kwargs) 2025-12-04T10:11:57.6678514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6678575Z with policy(): 2025-12-04T10:11:57.6678871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6678945Z raise RuntimeError(msg) 2025-12-04T10:11:57.6679799Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6679803Z 2025-12-04T10:11:57.6680013Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6680540Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6680544Z 2025-12-04T10:11:57.6680708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6680835Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6680931Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6681287Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6681417Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6681476Z graph_break [] 2025-12-04T10:11:57.6681609Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6682373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6682453Z if out == self.unknown_value: 2025-12-04T10:11:57.6682747Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6682855Z Traceback (most recent call last): 2025-12-04T10:11:57.6683163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6683228Z method(*args, **kwargs) 2025-12-04T10:11:57.6683525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6683593Z method(*args, **kwargs) 2025-12-04T10:11:57.6683883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6683949Z with policy(): 2025-12-04T10:11:57.6684246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6684313Z raise RuntimeError(msg) 2025-12-04T10:11:57.6685131Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6685137Z 2025-12-04T10:11:57.6685268Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6685795Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6685799Z 2025-12-04T10:11:57.6685955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6686086Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6686179Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6686525Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6686696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6686756Z graph_break [] 2025-12-04T10:11:57.6686894Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6687593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6687663Z if out == self.unknown_value: 2025-12-04T10:11:57.6687809Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6687901Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6688030Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6688374Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6688436Z graph_break [] 2025-12-04T10:11:57.6688524Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6688817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6688895Z Traceback (most recent call last): 2025-12-04T10:11:57.6689202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6689267Z method(*args, **kwargs) 2025-12-04T10:11:57.6689632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6689696Z method(*args, **kwargs) 2025-12-04T10:11:57.6689990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6690085Z with policy(): 2025-12-04T10:11:57.6690381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6690451Z raise RuntimeError(msg) 2025-12-04T10:11:57.6691264Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6691268Z 2025-12-04T10:11:57.6691399Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6691918Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6691924Z 2025-12-04T10:11:57.6692079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6692208Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6692300Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6692649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6692773Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6692830Z graph_break [] 2025-12-04T10:11:57.6692959Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6693653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6693774Z if out == self.unknown_value: 2025-12-04T10:11:57.6693898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6693988Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6694115Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6694457Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6694516Z graph_break [] 2025-12-04T10:11:57.6694646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6694736Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6694861Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6695200Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6695262Z graph_break [] 2025-12-04T10:11:57.6695750Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml - 2025-12-04T10:11:57.6695854Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6697209Z FAILED [0.4025s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6697246Z 2025-12-04T10:11:57.6697373Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6697910Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6697913Z 2025-12-04T10:11:57.6698069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6698174Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6698293Z ================== 1 failed, 57 deselected, 2 rerun in 11.85s ================== 2025-12-04T10:11:57.6698355Z Got exit code 1 2025-12-04T10:11:57.6698830Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6699075Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6699342Z W1204 09:45:29.332000 47715 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6699733Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml 2025-12-04T10:11:57.6699828Z ============================= test session starts ============================== 2025-12-04T10:11:57.6700043Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6700111Z cachedir: .pytest_cache 2025-12-04T10:11:57.6700420Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6700501Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6700566Z configfile: pytest.ini 2025-12-04T10:11:57.6700886Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6701056Z collecting ... collected 58 items / 24 deselected / 34 selected 2025-12-04T10:11:57.6701149Z stepcurrent: skipping 24 already run items. 2025-12-04T10:11:57.6701223Z Running 34 items in this shard 2025-12-04T10:11:57.6701230Z 2025-12-04T10:11:57.6701726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9248s] [ 2%] 2025-12-04T10:11:57.6702211Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5342s] [ 2%] 2025-12-04T10:11:57.6702656Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5290s] [ 2%] 2025-12-04T10:11:57.6702662Z 2025-12-04T10:11:57.6702745Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6703049Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6703126Z Traceback (most recent call last): 2025-12-04T10:11:57.6703435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6703506Z method(*args, **kwargs) 2025-12-04T10:11:57.6703864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6703933Z method(*args, **kwargs) 2025-12-04T10:11:57.6704223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6704282Z with policy(): 2025-12-04T10:11:57.6704637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6704704Z raise RuntimeError(msg) 2025-12-04T10:11:57.6705506Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6705513Z 2025-12-04T10:11:57.6705643Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6706165Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6706168Z 2025-12-04T10:11:57.6706331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6706463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6706562Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6707111Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6707239Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6707303Z graph_break [] 2025-12-04T10:11:57.6707594Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6707673Z Traceback (most recent call last): 2025-12-04T10:11:57.6707974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6708041Z method(*args, **kwargs) 2025-12-04T10:11:57.6708377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6708440Z method(*args, **kwargs) 2025-12-04T10:11:57.6708728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6708792Z with policy(): 2025-12-04T10:11:57.6709090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6709160Z raise RuntimeError(msg) 2025-12-04T10:11:57.6709972Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6709979Z 2025-12-04T10:11:57.6710102Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6710626Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6710630Z 2025-12-04T10:11:57.6710788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6710919Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6711012Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6711628Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6711757Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6711851Z graph_break [] 2025-12-04T10:11:57.6711979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6712067Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6712189Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6712730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6712787Z graph_break [] 2025-12-04T10:11:57.6712874Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6713161Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6713234Z Traceback (most recent call last): 2025-12-04T10:11:57.6713542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6713610Z method(*args, **kwargs) 2025-12-04T10:11:57.6713904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6713972Z method(*args, **kwargs) 2025-12-04T10:11:57.6714259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6714322Z with policy(): 2025-12-04T10:11:57.6714617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6714683Z raise RuntimeError(msg) 2025-12-04T10:11:57.6715496Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6715540Z 2025-12-04T10:11:57.6715664Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6716187Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6716190Z 2025-12-04T10:11:57.6716348Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6716494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6716585Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6717264Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6717400Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6717458Z graph_break [] 2025-12-04T10:11:57.6717583Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6717675Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6717794Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6718449Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6718511Z graph_break [] 2025-12-04T10:11:57.6718634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6718726Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6718893Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6719427Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6719484Z graph_break [] 2025-12-04T10:11:57.6720018Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml - 2025-12-04T10:11:57.6720127Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6721410Z FAILED [0.5290s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6721418Z 2025-12-04T10:11:57.6721550Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6722065Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6722069Z 2025-12-04T10:11:57.6722229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6722343Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6722463Z ================== 1 failed, 24 deselected, 2 rerun in 3.01s =================== 2025-12-04T10:11:57.6722528Z Got exit code 1 2025-12-04T10:11:57.6722652Z Retrying single test... 2025-12-04T10:11:57.6722914Z W1204 09:45:38.987000 47904 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6723300Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml 2025-12-04T10:11:57.6723398Z ============================= test session starts ============================== 2025-12-04T10:11:57.6723608Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6723672Z cachedir: .pytest_cache 2025-12-04T10:11:57.6723982Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6724067Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6724132Z configfile: pytest.ini 2025-12-04T10:11:57.6724453Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6724585Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6725152Z stepcurrent: skipping 24 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6725239Z Running 1 items in this shard 2025-12-04T10:11:57.6725243Z 2025-12-04T10:11:57.6726046Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:40.571251588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6726051Z 2025-12-04T10:11:57.6726354Z [W1204 09:45:49.638281803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6726402Z 2025-12-04T10:11:57.6726693Z [W1204 09:45:49.638538838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6726696Z 2025-12-04T10:11:57.6726989Z [W1204 09:45:49.644813355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6726993Z 2025-12-04T10:11:57.6727281Z [W1204 09:45:49.645408526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6727284Z 2025-12-04T10:11:57.6727576Z [W1204 09:45:49.645581328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6727580Z 2025-12-04T10:11:57.6727865Z [W1204 09:45:49.651069323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6727871Z 2025-12-04T10:11:57.6728157Z [W1204 09:45:49.651617882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6728164Z 2025-12-04T10:11:57.6728459Z [W1204 09:45:49.651788215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6728462Z 2025-12-04T10:11:57.6728542Z ('RERUN', {'yellow': True}) [11.0106s] [100%] 2025-12-04T10:11:57.6729269Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:50.454721756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6729272Z 2025-12-04T10:11:57.6729560Z [W1204 09:45:50.455227735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6729603Z 2025-12-04T10:11:57.6729895Z [W1204 09:45:50.455373717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6729902Z 2025-12-04T10:11:57.6730188Z [W1204 09:45:50.458241117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6730192Z 2025-12-04T10:11:57.6730479Z [W1204 09:45:50.458689504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6730483Z 2025-12-04T10:11:57.6730774Z [W1204 09:45:50.458828437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6730777Z 2025-12-04T10:11:57.6731063Z [W1204 09:45:50.463284513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6731075Z 2025-12-04T10:11:57.6731364Z [W1204 09:45:50.463735671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6731367Z 2025-12-04T10:11:57.6731656Z [W1204 09:45:50.463871533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6731659Z 2025-12-04T10:11:57.6731741Z ('RERUN', {'yellow': True}) [0.4941s] [100%] 2025-12-04T10:11:57.6732546Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:51.945541782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6732551Z 2025-12-04T10:11:57.6732856Z [W1204 09:45:51.946052661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6732894Z 2025-12-04T10:11:57.6733186Z [W1204 09:45:51.946191043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6733189Z 2025-12-04T10:11:57.6733479Z [W1204 09:45:51.949049642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6733482Z 2025-12-04T10:11:57.6733766Z [W1204 09:45:51.949488750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6733769Z 2025-12-04T10:11:57.6734062Z [W1204 09:45:51.949629272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6734066Z 2025-12-04T10:11:57.6734353Z [W1204 09:45:51.954107988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6734358Z 2025-12-04T10:11:57.6734645Z [W1204 09:45:51.954565266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6734651Z 2025-12-04T10:11:57.6734937Z [W1204 09:45:51.954704299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6734940Z 2025-12-04T10:11:57.6735006Z FAILED [0.4933s] [100%] 2025-12-04T10:11:57.6735009Z 2025-12-04T10:11:57.6735094Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6735399Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6735477Z Traceback (most recent call last): 2025-12-04T10:11:57.6735790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6735855Z method(*args, **kwargs) 2025-12-04T10:11:57.6736153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6736254Z method(*args, **kwargs) 2025-12-04T10:11:57.6736547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6736611Z with policy(): 2025-12-04T10:11:57.6736907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6736976Z raise RuntimeError(msg) 2025-12-04T10:11:57.6737786Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6737790Z 2025-12-04T10:11:57.6737918Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6738446Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6738450Z 2025-12-04T10:11:57.6738608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6738739Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6738832Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6739443Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6739576Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6739668Z graph_break [] 2025-12-04T10:11:57.6739803Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6740493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6740563Z if out == self.unknown_value: 2025-12-04T10:11:57.6740868Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6740943Z Traceback (most recent call last): 2025-12-04T10:11:57.6741249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6741312Z method(*args, **kwargs) 2025-12-04T10:11:57.6741602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6741671Z method(*args, **kwargs) 2025-12-04T10:11:57.6741961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6742022Z with policy(): 2025-12-04T10:11:57.6742319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6742385Z raise RuntimeError(msg) 2025-12-04T10:11:57.6743199Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6743203Z 2025-12-04T10:11:57.6743328Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6743854Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6743896Z 2025-12-04T10:11:57.6744054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6744179Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6744275Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6744821Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6744951Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6745010Z graph_break [] 2025-12-04T10:11:57.6745134Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6745829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6745898Z if out == self.unknown_value: 2025-12-04T10:11:57.6746022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6746112Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6746237Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6746845Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6746905Z graph_break [] 2025-12-04T10:11:57.6747099Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6747396Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6747469Z Traceback (most recent call last): 2025-12-04T10:11:57.6747770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6747836Z method(*args, **kwargs) 2025-12-04T10:11:57.6748127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6748195Z method(*args, **kwargs) 2025-12-04T10:11:57.6748508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6748570Z with policy(): 2025-12-04T10:11:57.6748871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6748939Z raise RuntimeError(msg) 2025-12-04T10:11:57.6749751Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6749755Z 2025-12-04T10:11:57.6749877Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6750399Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6750403Z 2025-12-04T10:11:57.6750559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6750684Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6750817Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6751360Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6751487Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6751545Z graph_break [] 2025-12-04T10:11:57.6751669Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6752361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6752428Z if out == self.unknown_value: 2025-12-04T10:11:57.6752555Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6752644Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6752766Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6753309Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6753367Z graph_break [] 2025-12-04T10:11:57.6753489Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6753646Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6753767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6754308Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6754401Z graph_break [] 2025-12-04T10:11:57.6754885Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml - 2025-12-04T10:11:57.6754989Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6756279Z FAILED [0.4933s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6756285Z 2025-12-04T10:11:57.6756410Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6756933Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6756936Z 2025-12-04T10:11:57.6757097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6757205Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6757321Z ================== 1 failed, 57 deselected, 2 rerun in 12.02s ================== 2025-12-04T10:11:57.6757385Z Got exit code 1 2025-12-04T10:11:57.6757452Z Retrying single test... 2025-12-04T10:11:57.6757718Z W1204 09:45:57.580000 48098 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6758103Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml 2025-12-04T10:11:57.6758257Z ============================= test session starts ============================== 2025-12-04T10:11:57.6758468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6758532Z cachedir: .pytest_cache 2025-12-04T10:11:57.6758852Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6758930Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6758994Z configfile: pytest.ini 2025-12-04T10:11:57.6759317Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6759445Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6760046Z stepcurrent: skipping 24 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6760123Z Running 1 items in this shard 2025-12-04T10:11:57.6760126Z 2025-12-04T10:11:57.6760850Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:59.177050356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6760854Z 2025-12-04T10:11:57.6761249Z [W1204 09:46:08.077094451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6761253Z 2025-12-04T10:11:57.6761548Z [W1204 09:46:08.077350555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6761585Z 2025-12-04T10:11:57.6761877Z [W1204 09:46:08.083408479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6761881Z 2025-12-04T10:11:57.6762166Z [W1204 09:46:08.084024229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6762170Z 2025-12-04T10:11:57.6762461Z [W1204 09:46:08.084201712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6762464Z 2025-12-04T10:11:57.6762754Z [W1204 09:46:08.089699206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6762757Z 2025-12-04T10:11:57.6763050Z [W1204 09:46:08.090265036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6763053Z 2025-12-04T10:11:57.6763341Z [W1204 09:46:08.090441719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6763346Z 2025-12-04T10:11:57.6763426Z ('RERUN', {'yellow': True}) [10.8494s] [100%] 2025-12-04T10:11:57.6764148Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:46:08.891393948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6764152Z 2025-12-04T10:11:57.6764443Z [W1204 09:46:08.891917327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6764446Z 2025-12-04T10:11:57.6764739Z [W1204 09:46:08.892057639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6764743Z 2025-12-04T10:11:57.6765030Z [W1204 09:46:08.895029540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6765068Z 2025-12-04T10:11:57.6765371Z [W1204 09:46:08.895487578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6765376Z 2025-12-04T10:11:57.6765668Z [W1204 09:46:08.895626890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6765671Z 2025-12-04T10:11:57.6765963Z [W1204 09:46:08.900307750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6765967Z 2025-12-04T10:11:57.6766253Z [W1204 09:46:08.900789989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6766257Z 2025-12-04T10:11:57.6766543Z [W1204 09:46:08.900926601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6766552Z 2025-12-04T10:11:57.6766629Z ('RERUN', {'yellow': True}) [0.4990s] [100%] 2025-12-04T10:11:57.6767353Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:46:09.386663421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6767356Z 2025-12-04T10:11:57.6767713Z [W1204 09:46:09.387173520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6767716Z 2025-12-04T10:11:57.6768014Z [W1204 09:46:09.387312682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6768017Z 2025-12-04T10:11:57.6768343Z [W1204 09:46:09.390234362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6768348Z 2025-12-04T10:11:57.6768634Z [W1204 09:46:09.390690769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6768637Z 2025-12-04T10:11:57.6768929Z [W1204 09:46:09.390829362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6768932Z 2025-12-04T10:11:57.6769218Z [W1204 09:46:09.395373209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6769224Z 2025-12-04T10:11:57.6769515Z [W1204 09:46:09.395824187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6769518Z 2025-12-04T10:11:57.6769805Z [W1204 09:46:09.395959999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6769810Z 2025-12-04T10:11:57.6769870Z FAILED [0.4940s] [100%] 2025-12-04T10:11:57.6769873Z 2025-12-04T10:11:57.6769959Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6770250Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6770327Z Traceback (most recent call last): 2025-12-04T10:11:57.6770643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6770714Z method(*args, **kwargs) 2025-12-04T10:11:57.6771013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6771075Z method(*args, **kwargs) 2025-12-04T10:11:57.6771367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6771469Z with policy(): 2025-12-04T10:11:57.6771764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6771834Z raise RuntimeError(msg) 2025-12-04T10:11:57.6772628Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6772635Z 2025-12-04T10:11:57.6772766Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6773287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6773293Z 2025-12-04T10:11:57.6773451Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6773586Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6773679Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6774226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6774746Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6774810Z graph_break [] 2025-12-04T10:11:57.6774938Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6775626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6775741Z if out == self.unknown_value: 2025-12-04T10:11:57.6776029Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6776104Z Traceback (most recent call last): 2025-12-04T10:11:57.6776404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6776468Z method(*args, **kwargs) 2025-12-04T10:11:57.6776764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6776830Z method(*args, **kwargs) 2025-12-04T10:11:57.6777121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6777185Z with policy(): 2025-12-04T10:11:57.6777481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6777546Z raise RuntimeError(msg) 2025-12-04T10:11:57.6778365Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6778371Z 2025-12-04T10:11:57.6778498Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6779023Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6779028Z 2025-12-04T10:11:57.6779181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6779346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6779445Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6779991Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6780124Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6780184Z graph_break [] 2025-12-04T10:11:57.6780309Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6781010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6781082Z if out == self.unknown_value: 2025-12-04T10:11:57.6781211Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6781307Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6781431Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6781973Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6782032Z graph_break [] 2025-12-04T10:11:57.6782188Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6782482Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6782589Z Traceback (most recent call last): 2025-12-04T10:11:57.6782897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6782960Z method(*args, **kwargs) 2025-12-04T10:11:57.6783255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6783322Z method(*args, **kwargs) 2025-12-04T10:11:57.6783611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6783674Z with policy(): 2025-12-04T10:11:57.6783977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6784047Z raise RuntimeError(msg) 2025-12-04T10:11:57.6784863Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6784871Z 2025-12-04T10:11:57.6784997Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6785517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6785520Z 2025-12-04T10:11:57.6785680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6785807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6785901Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6786441Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6786631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6786689Z graph_break [] 2025-12-04T10:11:57.6786813Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6787506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6787578Z if out == self.unknown_value: 2025-12-04T10:11:57.6787702Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6787791Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6787919Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6788464Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6788525Z graph_break [] 2025-12-04T10:11:57.6788650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6788736Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6788854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6789462Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6789521Z graph_break [] 2025-12-04T10:11:57.6790015Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml - 2025-12-04T10:11:57.6790152Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6791437Z FAILED [0.4940s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6791444Z 2025-12-04T10:11:57.6791569Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6792085Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6792091Z 2025-12-04T10:11:57.6792253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6792358Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6792477Z ================== 1 failed, 57 deselected, 2 rerun in 11.87s ================== 2025-12-04T10:11:57.6792535Z Got exit code 1 2025-12-04T10:11:57.6793005Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6793267Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6793532Z W1204 09:46:15.975000 48292 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6793933Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml 2025-12-04T10:11:57.6794068Z ============================= test session starts ============================== 2025-12-04T10:11:57.6794274Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6794347Z cachedir: .pytest_cache 2025-12-04T10:11:57.6794652Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6794729Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6794798Z configfile: pytest.ini 2025-12-04T10:11:57.6795112Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6795243Z collecting ... collected 58 items / 25 deselected / 33 selected 2025-12-04T10:11:57.6795330Z stepcurrent: skipping 25 already run items. 2025-12-04T10:11:57.6795402Z Running 33 items in this shard 2025-12-04T10:11:57.6795406Z 2025-12-04T10:11:57.6795907Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8940s] [ 3%] 2025-12-04T10:11:57.6796397Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5010s] [ 3%] 2025-12-04T10:11:57.6796913Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4898s] [ 3%] 2025-12-04T10:11:57.6796917Z 2025-12-04T10:11:57.6797001Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6797296Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6797410Z Traceback (most recent call last): 2025-12-04T10:11:57.6797715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6797783Z method(*args, **kwargs) 2025-12-04T10:11:57.6798077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6798139Z method(*args, **kwargs) 2025-12-04T10:11:57.6798437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6798499Z with policy(): 2025-12-04T10:11:57.6798794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6798863Z raise RuntimeError(msg) 2025-12-04T10:11:57.6799671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6799677Z 2025-12-04T10:11:57.6799807Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6800394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6800398Z 2025-12-04T10:11:57.6800563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6800692Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6800785Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6801140Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6801308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6801372Z graph_break [] 2025-12-04T10:11:57.6801658Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6801731Z Traceback (most recent call last): 2025-12-04T10:11:57.6802031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6802097Z method(*args, **kwargs) 2025-12-04T10:11:57.6802391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6802457Z method(*args, **kwargs) 2025-12-04T10:11:57.6802751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6802815Z with policy(): 2025-12-04T10:11:57.6803107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6803172Z raise RuntimeError(msg) 2025-12-04T10:11:57.6803988Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6804059Z 2025-12-04T10:11:57.6804185Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6804707Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6804746Z 2025-12-04T10:11:57.6804902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6805028Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6805124Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6805471Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6805600Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6805659Z graph_break [] 2025-12-04T10:11:57.6805784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6805877Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6805998Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6806345Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6806405Z graph_break [] 2025-12-04T10:11:57.6806486Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6806779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6806852Z Traceback (most recent call last): 2025-12-04T10:11:57.6807150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6807219Z method(*args, **kwargs) 2025-12-04T10:11:57.6807513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6807576Z method(*args, **kwargs) 2025-12-04T10:11:57.6807868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6807966Z with policy(): 2025-12-04T10:11:57.6808274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6808339Z raise RuntimeError(msg) 2025-12-04T10:11:57.6809162Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6809169Z 2025-12-04T10:11:57.6809295Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6809812Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6809819Z 2025-12-04T10:11:57.6809978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6810101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6810194Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6810534Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6810657Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6810787Z graph_break [] 2025-12-04T10:11:57.6810911Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6811003Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6811122Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6811523Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6811585Z graph_break [] 2025-12-04T10:11:57.6811707Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6811794Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6811920Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6812256Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6812321Z graph_break [] 2025-12-04T10:11:57.6812808Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml - 2025-12-04T10:11:57.6812909Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6814206Z FAILED [0.4898s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6814210Z 2025-12-04T10:11:57.6814336Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6814860Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6814864Z 2025-12-04T10:11:57.6815021Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6815168Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6815282Z ================== 1 failed, 25 deselected, 2 rerun in 2.91s =================== 2025-12-04T10:11:57.6815341Z Got exit code 1 2025-12-04T10:11:57.6815420Z Retrying single test... 2025-12-04T10:11:57.6815688Z W1204 09:46:25.698000 48481 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6816073Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml 2025-12-04T10:11:57.6816175Z ============================= test session starts ============================== 2025-12-04T10:11:57.6816385Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6816456Z cachedir: .pytest_cache 2025-12-04T10:11:57.6816766Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6816843Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6816914Z configfile: pytest.ini 2025-12-04T10:11:57.6817341Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6817479Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6818129Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6818203Z Running 1 items in this shard 2025-12-04T10:11:57.6818206Z 2025-12-04T10:11:57.6818940Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:26.781109894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6818989Z 2025-12-04T10:11:57.6819293Z [W1204 09:46:35.738526287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6819297Z 2025-12-04T10:11:57.6819593Z [W1204 09:46:35.738779521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6819597Z 2025-12-04T10:11:57.6819887Z [W1204 09:46:35.745010618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6819891Z 2025-12-04T10:11:57.6820183Z [W1204 09:46:35.745607378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6820188Z 2025-12-04T10:11:57.6820476Z [W1204 09:46:35.745778641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6820481Z 2025-12-04T10:11:57.6820774Z [W1204 09:46:35.751146243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6820777Z 2025-12-04T10:11:57.6821063Z [W1204 09:46:35.751684102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6821067Z 2025-12-04T10:11:57.6821355Z [W1204 09:46:35.751847365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6821364Z 2025-12-04T10:11:57.6821447Z ('RERUN', {'yellow': True}) [10.8398s] [100%] 2025-12-04T10:11:57.6822172Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:37.951592257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6822212Z 2025-12-04T10:11:57.6822508Z [W1204 09:46:37.952102836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6822511Z 2025-12-04T10:11:57.6822803Z [W1204 09:46:37.952248798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6822806Z 2025-12-04T10:11:57.6823101Z [W1204 09:46:37.955093097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6823104Z 2025-12-04T10:11:57.6823393Z [W1204 09:46:37.955636037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6823396Z 2025-12-04T10:11:57.6823687Z [W1204 09:46:37.955774469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6823693Z 2025-12-04T10:11:57.6823989Z [W1204 09:46:37.960146324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6823993Z 2025-12-04T10:11:57.6824287Z [W1204 09:46:37.960607612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6824290Z 2025-12-04T10:11:57.6824582Z [W1204 09:46:37.960745225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6824660Z 2025-12-04T10:11:57.6824740Z ('RERUN', {'yellow': True}) [0.4432s] [100%] 2025-12-04T10:11:57.6825465Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:37.393116511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6825504Z 2025-12-04T10:11:57.6825793Z [W1204 09:46:37.393626150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6825796Z 2025-12-04T10:11:57.6826089Z [W1204 09:46:37.393764312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6826093Z 2025-12-04T10:11:57.6826381Z [W1204 09:46:37.396576890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6826384Z 2025-12-04T10:11:57.6826677Z [W1204 09:46:37.397114290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6826680Z 2025-12-04T10:11:57.6826966Z [W1204 09:46:37.397252212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6826972Z 2025-12-04T10:11:57.6827262Z [W1204 09:46:37.401614267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6827266Z 2025-12-04T10:11:57.6827565Z [W1204 09:46:37.402068125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6827568Z 2025-12-04T10:11:57.6827858Z [W1204 09:46:37.402203177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6827867Z 2025-12-04T10:11:57.6827930Z FAILED [0.4408s] [100%] 2025-12-04T10:11:57.6827933Z 2025-12-04T10:11:57.6828021Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6828324Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6828437Z Traceback (most recent call last): 2025-12-04T10:11:57.6828743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6828813Z method(*args, **kwargs) 2025-12-04T10:11:57.6829106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6829172Z method(*args, **kwargs) 2025-12-04T10:11:57.6829463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6829524Z with policy(): 2025-12-04T10:11:57.6829826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6829893Z raise RuntimeError(msg) 2025-12-04T10:11:57.6830703Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6830711Z 2025-12-04T10:11:57.6830842Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6831361Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6831368Z 2025-12-04T10:11:57.6831594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6831726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6831822Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6832200Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6832329Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6832392Z graph_break [] 2025-12-04T10:11:57.6832516Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6833211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6833287Z if out == self.unknown_value: 2025-12-04T10:11:57.6833579Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6833658Z Traceback (most recent call last): 2025-12-04T10:11:57.6833960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6834029Z method(*args, **kwargs) 2025-12-04T10:11:57.6834320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6834381Z method(*args, **kwargs) 2025-12-04T10:11:57.6834677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6834736Z with policy(): 2025-12-04T10:11:57.6835030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6835100Z raise RuntimeError(msg) 2025-12-04T10:11:57.6835927Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6835970Z 2025-12-04T10:11:57.6836102Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6836621Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6836625Z 2025-12-04T10:11:57.6836785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6836912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6837008Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6837358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6837488Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6837546Z graph_break [] 2025-12-04T10:11:57.6837676Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6838372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6838452Z if out == self.unknown_value: 2025-12-04T10:11:57.6838576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6838752Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6838879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6839226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6839323Z graph_break [] 2025-12-04T10:11:57.6839407Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6839696Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6839778Z Traceback (most recent call last): 2025-12-04T10:11:57.6840135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6840201Z method(*args, **kwargs) 2025-12-04T10:11:57.6840498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6840560Z method(*args, **kwargs) 2025-12-04T10:11:57.6840854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6840915Z with policy(): 2025-12-04T10:11:57.6841210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6841284Z raise RuntimeError(msg) 2025-12-04T10:11:57.6842097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6842101Z 2025-12-04T10:11:57.6842228Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6842749Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6842753Z 2025-12-04T10:11:57.6842918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6843085Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6843175Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6843519Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6843642Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6843702Z graph_break [] 2025-12-04T10:11:57.6843828Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6844518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6844592Z if out == self.unknown_value: 2025-12-04T10:11:57.6844718Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6844810Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6844934Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6845274Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6845336Z graph_break [] 2025-12-04T10:11:57.6845457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6845611Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6845736Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6846074Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6846166Z graph_break [] 2025-12-04T10:11:57.6846655Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml - 2025-12-04T10:11:57.6846756Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6848054Z FAILED [0.4408s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6848059Z 2025-12-04T10:11:57.6848186Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6848721Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6848725Z 2025-12-04T10:11:57.6848882Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6848988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6849104Z ================== 1 failed, 57 deselected, 2 rerun in 11.75s ================== 2025-12-04T10:11:57.6849163Z Got exit code 1 2025-12-04T10:11:57.6849230Z Retrying single test... 2025-12-04T10:11:57.6849499Z W1204 09:46:44.099000 48674 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6849883Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml 2025-12-04T10:11:57.6850021Z ============================= test session starts ============================== 2025-12-04T10:11:57.6850229Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6850298Z cachedir: .pytest_cache 2025-12-04T10:11:57.6850602Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6850678Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6850750Z configfile: pytest.ini 2025-12-04T10:11:57.6851071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6851200Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6851774Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6851847Z Running 1 items in this shard 2025-12-04T10:11:57.6851851Z 2025-12-04T10:11:57.6852585Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:45.197260818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6852589Z 2025-12-04T10:11:57.6852886Z [W1204 09:46:54.444797202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6852955Z 2025-12-04T10:11:57.6853252Z [W1204 09:46:54.445047086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6853256Z 2025-12-04T10:11:57.6853543Z [W1204 09:46:54.451006968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6853584Z 2025-12-04T10:11:57.6853872Z [W1204 09:46:54.451608368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6853875Z 2025-12-04T10:11:57.6854161Z [W1204 09:46:54.451776261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6854164Z 2025-12-04T10:11:57.6854451Z [W1204 09:46:54.457431578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6854457Z 2025-12-04T10:11:57.6854747Z [W1204 09:46:54.457982597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6854750Z 2025-12-04T10:11:57.6855036Z [W1204 09:46:54.458143790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6855042Z 2025-12-04T10:11:57.6855140Z ('RERUN', {'yellow': True}) [11.1549s] [100%] 2025-12-04T10:11:57.6855866Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:55.670629160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6855870Z 2025-12-04T10:11:57.6856160Z [W1204 09:46:55.671172150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6856164Z 2025-12-04T10:11:57.6856454Z [W1204 09:46:55.671318762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6856457Z 2025-12-04T10:11:57.6856751Z [W1204 09:46:55.674330774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6856791Z 2025-12-04T10:11:57.6857079Z [W1204 09:46:55.674908434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6857082Z 2025-12-04T10:11:57.6857369Z [W1204 09:46:55.675049736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6857376Z 2025-12-04T10:11:57.6857662Z [W1204 09:46:55.679723737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6857665Z 2025-12-04T10:11:57.6857955Z [W1204 09:46:55.680214216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6857958Z 2025-12-04T10:11:57.6858251Z [W1204 09:46:55.680371848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6858257Z 2025-12-04T10:11:57.6858336Z ('RERUN', {'yellow': True}) [0.4530s] [100%] 2025-12-04T10:11:57.6859063Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:56.121249793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6859067Z 2025-12-04T10:11:57.6859355Z [W1204 09:46:56.121789622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6859358Z 2025-12-04T10:11:57.6859797Z [W1204 09:46:56.121933844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6859800Z 2025-12-04T10:11:57.6860088Z [W1204 09:46:56.124920655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6860124Z 2025-12-04T10:11:57.6860419Z [W1204 09:46:56.125483765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6860422Z 2025-12-04T10:11:57.6860708Z [W1204 09:46:56.125621698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6860711Z 2025-12-04T10:11:57.6860994Z [W1204 09:46:56.130344708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6861003Z 2025-12-04T10:11:57.6861291Z [W1204 09:46:56.130819956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6861294Z 2025-12-04T10:11:57.6861579Z [W1204 09:46:56.130957259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6861585Z 2025-12-04T10:11:57.6861650Z FAILED [0.4479s] [100%] 2025-12-04T10:11:57.6861654Z 2025-12-04T10:11:57.6861734Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6862030Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6862105Z Traceback (most recent call last): 2025-12-04T10:11:57.6862410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6862482Z method(*args, **kwargs) 2025-12-04T10:11:57.6862775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6862839Z method(*args, **kwargs) 2025-12-04T10:11:57.6863138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6863234Z with policy(): 2025-12-04T10:11:57.6863531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6863596Z raise RuntimeError(msg) 2025-12-04T10:11:57.6864400Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6864409Z 2025-12-04T10:11:57.6864539Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6865056Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6865061Z 2025-12-04T10:11:57.6865224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6865362Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6865461Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6865814Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6865941Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6866001Z graph_break [] 2025-12-04T10:11:57.6866209Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6866903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6867014Z if out == self.unknown_value: 2025-12-04T10:11:57.6867302Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6867382Z Traceback (most recent call last): 2025-12-04T10:11:57.6867680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6867744Z method(*args, **kwargs) 2025-12-04T10:11:57.6868038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6868101Z method(*args, **kwargs) 2025-12-04T10:11:57.6868397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6868456Z with policy(): 2025-12-04T10:11:57.6868756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6868832Z raise RuntimeError(msg) 2025-12-04T10:11:57.6869647Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6869651Z 2025-12-04T10:11:57.6869791Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6870315Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6870319Z 2025-12-04T10:11:57.6870475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6870604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6870737Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6871089Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6871218Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6871277Z graph_break [] 2025-12-04T10:11:57.6871403Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6872096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6872175Z if out == self.unknown_value: 2025-12-04T10:11:57.6872298Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6872392Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6872522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6872864Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6872924Z graph_break [] 2025-12-04T10:11:57.6873011Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6873300Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.6873445Z Traceback (most recent call last): 2025-12-04T10:11:57.6873745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6873807Z method(*args, **kwargs) 2025-12-04T10:11:57.6874111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6874209Z method(*args, **kwargs) 2025-12-04T10:11:57.6874497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6874561Z with policy(): 2025-12-04T10:11:57.6874855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6874924Z raise RuntimeError(msg) 2025-12-04T10:11:57.6875742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6875746Z 2025-12-04T10:11:57.6875872Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6876394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6876397Z 2025-12-04T10:11:57.6876553Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6876684Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6876775Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6877124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6877246Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6877304Z graph_break [] 2025-12-04T10:11:57.6877430Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6878154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6878223Z if out == self.unknown_value: 2025-12-04T10:11:57.6878348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6878436Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6878559Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6878903Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6878961Z graph_break [] 2025-12-04T10:11:57.6879088Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6879189Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6879318Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6879661Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6879719Z graph_break [] 2025-12-04T10:11:57.6880253Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml - 2025-12-04T10:11:57.6880354Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6881719Z FAILED [0.4479s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6881758Z 2025-12-04T10:11:57.6881883Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6882405Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6882408Z 2025-12-04T10:11:57.6882564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6882670Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6882789Z ================== 1 failed, 57 deselected, 2 rerun in 12.08s ================== 2025-12-04T10:11:57.6882849Z Got exit code 1 2025-12-04T10:11:57.6883326Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.6883574Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6883834Z W1204 09:47:02.758000 48867 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6884222Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml 2025-12-04T10:11:57.6884318Z ============================= test session starts ============================== 2025-12-04T10:11:57.6884523Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6884593Z cachedir: .pytest_cache 2025-12-04T10:11:57.6884899Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6885019Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6885087Z configfile: pytest.ini 2025-12-04T10:11:57.6885399Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6885530Z collecting ... collected 58 items / 26 deselected / 32 selected 2025-12-04T10:11:57.6885618Z stepcurrent: skipping 26 already run items. 2025-12-04T10:11:57.6885688Z Running 32 items in this shard 2025-12-04T10:11:57.6885697Z 2025-12-04T10:11:57.6886193Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8462s] [ 3%] 2025-12-04T10:11:57.6886674Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4490s] [ 3%] 2025-12-04T10:11:57.6887121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4405s] [ 3%] 2025-12-04T10:11:57.6887125Z 2025-12-04T10:11:57.6887206Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6887497Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6887571Z Traceback (most recent call last): 2025-12-04T10:11:57.6887944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6888016Z method(*args, **kwargs) 2025-12-04T10:11:57.6888313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6888414Z method(*args, **kwargs) 2025-12-04T10:11:57.6888706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6888770Z with policy(): 2025-12-04T10:11:57.6889078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6889155Z raise RuntimeError(msg) 2025-12-04T10:11:57.6889953Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6889963Z 2025-12-04T10:11:57.6890090Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6890601Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6890609Z 2025-12-04T10:11:57.6890772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6890903Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6891000Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6891345Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6891474Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6891537Z graph_break [] 2025-12-04T10:11:57.6891824Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6891897Z Traceback (most recent call last): 2025-12-04T10:11:57.6892256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6892321Z method(*args, **kwargs) 2025-12-04T10:11:57.6892613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6892678Z method(*args, **kwargs) 2025-12-04T10:11:57.6892966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6893031Z with policy(): 2025-12-04T10:11:57.6893331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6893395Z raise RuntimeError(msg) 2025-12-04T10:11:57.6894200Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6894207Z 2025-12-04T10:11:57.6894329Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6894850Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6894853Z 2025-12-04T10:11:57.6895012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6895205Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6895299Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6895644Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6895808Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6895869Z graph_break [] 2025-12-04T10:11:57.6895998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6896089Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6896213Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6896559Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6896616Z graph_break [] 2025-12-04T10:11:57.6896702Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6896996Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6897071Z Traceback (most recent call last): 2025-12-04T10:11:57.6897380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6897443Z method(*args, **kwargs) 2025-12-04T10:11:57.6897733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6897802Z method(*args, **kwargs) 2025-12-04T10:11:57.6898092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6898151Z with policy(): 2025-12-04T10:11:57.6898448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6898513Z raise RuntimeError(msg) 2025-12-04T10:11:57.6899320Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6899365Z 2025-12-04T10:11:57.6899490Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6900009Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6900014Z 2025-12-04T10:11:57.6900176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6900305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6900399Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6900741Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6900869Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6900926Z graph_break [] 2025-12-04T10:11:57.6901052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6901143Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6901265Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6901607Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6901673Z graph_break [] 2025-12-04T10:11:57.6901867Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6901966Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6902087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6902456Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6902520Z graph_break [] 2025-12-04T10:11:57.6903009Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml - 2025-12-04T10:11:57.6903114Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6904386Z FAILED [0.4405s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6904394Z 2025-12-04T10:11:57.6904520Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6905035Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6905038Z 2025-12-04T10:11:57.6905193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6905307Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6905424Z ================== 1 failed, 26 deselected, 2 rerun in 2.76s =================== 2025-12-04T10:11:57.6905484Z Got exit code 1 2025-12-04T10:11:57.6905550Z Retrying single test... 2025-12-04T10:11:57.6905812Z W1204 09:47:12.424000 49048 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6906202Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml 2025-12-04T10:11:57.6906341Z ============================= test session starts ============================== 2025-12-04T10:11:57.6906546Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6906617Z cachedir: .pytest_cache 2025-12-04T10:11:57.6906923Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6907000Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6907065Z configfile: pytest.ini 2025-12-04T10:11:57.6907383Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6907517Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6908081Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6908155Z Running 1 items in this shard 2025-12-04T10:11:57.6908165Z 2025-12-04T10:11:57.6908892Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:13.468931279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6908897Z 2025-12-04T10:11:57.6909264Z [W1204 09:47:22.608270471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6909272Z 2025-12-04T10:11:57.6909565Z [W1204 09:47:22.608521445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6909603Z 2025-12-04T10:11:57.6909892Z [W1204 09:47:22.614159122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6909895Z 2025-12-04T10:11:57.6910199Z [W1204 09:47:22.614688740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6910202Z 2025-12-04T10:11:57.6910496Z [W1204 09:47:22.614860534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6910500Z 2025-12-04T10:11:57.6910797Z [W1204 09:47:22.620152374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6910801Z 2025-12-04T10:11:57.6911087Z [W1204 09:47:22.620685213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6911092Z 2025-12-04T10:11:57.6911383Z [W1204 09:47:22.620847496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6911386Z 2025-12-04T10:11:57.6911469Z ('RERUN', {'yellow': True}) [10.9930s] [100%] 2025-12-04T10:11:57.6912189Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:23.796426214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6912198Z 2025-12-04T10:11:57.6912493Z [W1204 09:47:23.796999693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6912496Z 2025-12-04T10:11:57.6912786Z [W1204 09:47:23.797144736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6912827Z 2025-12-04T10:11:57.6913119Z [W1204 09:47:23.800117977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6913122Z 2025-12-04T10:11:57.6913409Z [W1204 09:47:23.800701457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6913412Z 2025-12-04T10:11:57.6913703Z [W1204 09:47:23.800843069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6913706Z 2025-12-04T10:11:57.6913996Z [W1204 09:47:23.805405008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6913999Z 2025-12-04T10:11:57.6914290Z [W1204 09:47:23.805870696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6914295Z 2025-12-04T10:11:57.6914586Z [W1204 09:47:23.806008088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6914590Z 2025-12-04T10:11:57.6914671Z ('RERUN', {'yellow': True}) [0.4142s] [100%] 2025-12-04T10:11:57.6915387Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:24.207798797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6915391Z 2025-12-04T10:11:57.6915745Z [W1204 09:47:24.208371647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6915753Z 2025-12-04T10:11:57.6916039Z [W1204 09:47:24.208523710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6916100Z 2025-12-04T10:11:57.6916393Z [W1204 09:47:24.211498741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6916396Z 2025-12-04T10:11:57.6916687Z [W1204 09:47:24.212069401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6916690Z 2025-12-04T10:11:57.6916977Z [W1204 09:47:24.212210073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6917088Z 2025-12-04T10:11:57.6917386Z [W1204 09:47:24.216742481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6917389Z 2025-12-04T10:11:57.6917677Z [W1204 09:47:24.217204769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6917680Z 2025-12-04T10:11:57.6917968Z [W1204 09:47:24.217342092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6917973Z 2025-12-04T10:11:57.6918033Z FAILED [0.4100s] [100%] 2025-12-04T10:11:57.6918037Z 2025-12-04T10:11:57.6918120Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6918415Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6918489Z Traceback (most recent call last): 2025-12-04T10:11:57.6918801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6918867Z method(*args, **kwargs) 2025-12-04T10:11:57.6919162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6919228Z method(*args, **kwargs) 2025-12-04T10:11:57.6919520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6919650Z with policy(): 2025-12-04T10:11:57.6919985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6920053Z raise RuntimeError(msg) 2025-12-04T10:11:57.6920871Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6920875Z 2025-12-04T10:11:57.6921003Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6921521Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6921528Z 2025-12-04T10:11:57.6921687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6921814Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6921915Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6922260Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6922390Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6922547Z graph_break [] 2025-12-04T10:11:57.6922675Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6923374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6923492Z if out == self.unknown_value: 2025-12-04T10:11:57.6923784Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6923862Z Traceback (most recent call last): 2025-12-04T10:11:57.6924161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6924227Z method(*args, **kwargs) 2025-12-04T10:11:57.6924521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6924584Z method(*args, **kwargs) 2025-12-04T10:11:57.6924877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6924937Z with policy(): 2025-12-04T10:11:57.6925240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6925305Z raise RuntimeError(msg) 2025-12-04T10:11:57.6926104Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6926108Z 2025-12-04T10:11:57.6926236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6926754Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6926757Z 2025-12-04T10:11:57.6926916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6927080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6927174Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6927524Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6927651Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6927728Z graph_break [] 2025-12-04T10:11:57.6927854Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6928548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6928623Z if out == self.unknown_value: 2025-12-04T10:11:57.6928749Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6928847Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6928970Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6929312Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6929373Z graph_break [] 2025-12-04T10:11:57.6929456Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6929810Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6929888Z Traceback (most recent call last): 2025-12-04T10:11:57.6930188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6930293Z method(*args, **kwargs) 2025-12-04T10:11:57.6930593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6930657Z method(*args, **kwargs) 2025-12-04T10:11:57.6930953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6931011Z with policy(): 2025-12-04T10:11:57.6931302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6931382Z raise RuntimeError(msg) 2025-12-04T10:11:57.6932192Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6932198Z 2025-12-04T10:11:57.6932328Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6932845Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6932848Z 2025-12-04T10:11:57.6933009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6933133Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6933223Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6933577Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6933699Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6933765Z graph_break [] 2025-12-04T10:11:57.6933928Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6934613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6934686Z if out == self.unknown_value: 2025-12-04T10:11:57.6934807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6934899Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6935028Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6935371Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6935439Z graph_break [] 2025-12-04T10:11:57.6935571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6935662Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6935785Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6936127Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6936189Z graph_break [] 2025-12-04T10:11:57.6936672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml - 2025-12-04T10:11:57.6936845Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6938129Z FAILED [0.4100s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6938167Z 2025-12-04T10:11:57.6938291Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6938809Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6938812Z 2025-12-04T10:11:57.6938969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6939075Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6939191Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ================== 2025-12-04T10:11:57.6939255Z Got exit code 1 2025-12-04T10:11:57.6939328Z Retrying single test... 2025-12-04T10:11:57.6939589Z W1204 09:47:30.809000 49234 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6939975Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml 2025-12-04T10:11:57.6940071Z ============================= test session starts ============================== 2025-12-04T10:11:57.6940279Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6940355Z cachedir: .pytest_cache 2025-12-04T10:11:57.6940660Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6940735Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6940806Z configfile: pytest.ini 2025-12-04T10:11:57.6941159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6941293Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.6941863Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6941931Z Running 1 items in this shard 2025-12-04T10:11:57.6941934Z 2025-12-04T10:11:57.6942671Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:31.867734784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6942675Z 2025-12-04T10:11:57.6942971Z [W1204 09:47:41.987256046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6942977Z 2025-12-04T10:11:57.6943273Z [W1204 09:47:41.987511401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6943277Z 2025-12-04T10:11:57.6943568Z [W1204 09:47:41.993481393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6943571Z 2025-12-04T10:11:57.6943859Z [W1204 09:47:41.994077613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6943863Z 2025-12-04T10:11:57.6944252Z [W1204 09:47:41.994245386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6944256Z 2025-12-04T10:11:57.6944552Z [W1204 09:47:41.999602488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6944591Z 2025-12-04T10:11:57.6944880Z [W1204 09:47:41.000143247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6944882Z 2025-12-04T10:11:57.6945169Z [W1204 09:47:41.000319840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6945176Z 2025-12-04T10:11:57.6945256Z ('RERUN', {'yellow': True}) [10.9862s] [100%] 2025-12-04T10:11:57.6945975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:42.176240715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6945979Z 2025-12-04T10:11:57.6946274Z [W1204 09:47:42.176833565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6946279Z 2025-12-04T10:11:57.6946567Z [W1204 09:47:42.176979528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6946570Z 2025-12-04T10:11:57.6946861Z [W1204 09:47:42.179905278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6946865Z 2025-12-04T10:11:57.6947154Z [W1204 09:47:42.180501868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6947157Z 2025-12-04T10:11:57.6947450Z [W1204 09:47:42.180647451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6947453Z 2025-12-04T10:11:57.6947740Z [W1204 09:47:42.185136827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6947781Z 2025-12-04T10:11:57.6948074Z [W1204 09:47:42.185595005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6948077Z 2025-12-04T10:11:57.6948366Z [W1204 09:47:42.185730668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6948369Z 2025-12-04T10:11:57.6948450Z ('RERUN', {'yellow': True}) [0.4113s] [100%] 2025-12-04T10:11:57.6949190Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:42.584632488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6949194Z 2025-12-04T10:11:57.6949483Z [W1204 09:47:42.585215048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6949490Z 2025-12-04T10:11:57.6949781Z [W1204 09:47:42.585359050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6949785Z 2025-12-04T10:11:57.6950071Z [W1204 09:47:42.588259700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6950075Z 2025-12-04T10:11:57.6950365Z [W1204 09:47:42.588826930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6950368Z 2025-12-04T10:11:57.6950721Z [W1204 09:47:42.588967022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6950725Z 2025-12-04T10:11:57.6951019Z [W1204 09:47:42.593462549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6951057Z 2025-12-04T10:11:57.6951344Z [W1204 09:47:42.593962038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6951347Z 2025-12-04T10:11:57.6951633Z [W1204 09:47:42.594099000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.6951639Z 2025-12-04T10:11:57.6951703Z FAILED [0.4066s] [100%] 2025-12-04T10:11:57.6951706Z 2025-12-04T10:11:57.6951787Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6952084Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6952159Z Traceback (most recent call last): 2025-12-04T10:11:57.6952469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6952541Z method(*args, **kwargs) 2025-12-04T10:11:57.6952836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6952903Z method(*args, **kwargs) 2025-12-04T10:11:57.6953189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6953248Z with policy(): 2025-12-04T10:11:57.6953547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6953612Z raise RuntimeError(msg) 2025-12-04T10:11:57.6954410Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6954451Z 2025-12-04T10:11:57.6954581Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6955099Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6955108Z 2025-12-04T10:11:57.6955275Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6955403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6955505Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6955855Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6955982Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6956047Z graph_break [] 2025-12-04T10:11:57.6956173Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6956873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6956944Z if out == self.unknown_value: 2025-12-04T10:11:57.6957232Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6957311Z Traceback (most recent call last): 2025-12-04T10:11:57.6957680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6957797Z method(*args, **kwargs) 2025-12-04T10:11:57.6958154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6958374Z method(*args, **kwargs) 2025-12-04T10:11:57.6958724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6958812Z with policy(): 2025-12-04T10:11:57.6963099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6963209Z raise RuntimeError(msg) 2025-12-04T10:11:57.6964063Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6964069Z 2025-12-04T10:11:57.6964210Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6964748Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6964757Z 2025-12-04T10:11:57.6964922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6965064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6965164Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6965517Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6965657Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6965718Z graph_break [] 2025-12-04T10:11:57.6965864Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6966574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6966723Z if out == self.unknown_value: 2025-12-04T10:11:57.6966861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6966959Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6967093Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6967442Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6967502Z graph_break [] 2025-12-04T10:11:57.6967590Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6967891Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.6967973Z Traceback (most recent call last): 2025-12-04T10:11:57.6968314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6968381Z method(*args, **kwargs) 2025-12-04T10:11:57.6968676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6968737Z method(*args, **kwargs) 2025-12-04T10:11:57.6969027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6969163Z with policy(): 2025-12-04T10:11:57.6969473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6969544Z raise RuntimeError(msg) 2025-12-04T10:11:57.6970359Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6970400Z 2025-12-04T10:11:57.6970535Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6971058Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6971062Z 2025-12-04T10:11:57.6971228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6971363Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6971459Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6971813Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6971950Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6972008Z graph_break [] 2025-12-04T10:11:57.6972134Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.6972838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.6972912Z if out == self.unknown_value: 2025-12-04T10:11:57.6973037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6973128Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6973260Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6973659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6973718Z graph_break [] 2025-12-04T10:11:57.6973846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6973933Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6974054Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6974409Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6974472Z graph_break [] 2025-12-04T10:11:57.6974974Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml - 2025-12-04T10:11:57.6975080Z =========================== short test summary info ============================ 2025-12-04T10:11:57.6976361Z FAILED [0.4066s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6976370Z 2025-12-04T10:11:57.6976588Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6977110Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6977146Z 2025-12-04T10:11:57.6977310Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6977421Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.6977541Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ================== 2025-12-04T10:11:57.6977605Z Got exit code 1 2025-12-04T10:11:57.6978078Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.6978326Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.6978593Z W1204 09:47:49.188000 49420 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.6978985Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml 2025-12-04T10:11:57.6979090Z ============================= test session starts ============================== 2025-12-04T10:11:57.6979297Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.6979367Z cachedir: .pytest_cache 2025-12-04T10:11:57.6979685Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.6979764Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.6979835Z configfile: pytest.ini 2025-12-04T10:11:57.6980153Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.6980289Z collecting ... collected 58 items / 27 deselected / 31 selected 2025-12-04T10:11:57.6980377Z stepcurrent: skipping 27 already run items. 2025-12-04T10:11:57.6980447Z Running 31 items in this shard 2025-12-04T10:11:57.6980451Z 2025-12-04T10:11:57.6980950Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9343s] [ 3%] 2025-12-04T10:11:57.6981471Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5512s] [ 3%] 2025-12-04T10:11:57.6981914Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5439s] [ 3%] 2025-12-04T10:11:57.6981918Z 2025-12-04T10:11:57.6982003Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.6982290Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6982368Z Traceback (most recent call last): 2025-12-04T10:11:57.6982678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6982749Z method(*args, **kwargs) 2025-12-04T10:11:57.6983037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6983100Z method(*args, **kwargs) 2025-12-04T10:11:57.6983388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6983448Z with policy(): 2025-12-04T10:11:57.6983811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6983880Z raise RuntimeError(msg) 2025-12-04T10:11:57.6984671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.6984710Z 2025-12-04T10:11:57.6984841Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6985353Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6985358Z 2025-12-04T10:11:57.6985520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6985650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6985745Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6986295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6986425Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6986486Z graph_break [] 2025-12-04T10:11:57.6986772Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6986844Z Traceback (most recent call last): 2025-12-04T10:11:57.6987336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6987406Z method(*args, **kwargs) 2025-12-04T10:11:57.6987704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6987771Z method(*args, **kwargs) 2025-12-04T10:11:57.6988057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6988188Z with policy(): 2025-12-04T10:11:57.6988492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6988557Z raise RuntimeError(msg) 2025-12-04T10:11:57.6989362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.6989366Z 2025-12-04T10:11:57.6989495Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6990012Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6990017Z 2025-12-04T10:11:57.6990176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6990308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6990400Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6990943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6991075Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6991134Z graph_break [] 2025-12-04T10:11:57.6991327Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6991424Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6991544Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6992121Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6992183Z graph_break [] 2025-12-04T10:11:57.6992272Z =================================== FAILURES =================================== 2025-12-04T10:11:57.6992561Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.6992635Z Traceback (most recent call last): 2025-12-04T10:11:57.6992938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6993001Z method(*args, **kwargs) 2025-12-04T10:11:57.6993288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.6993354Z method(*args, **kwargs) 2025-12-04T10:11:57.6993643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.6993702Z with policy(): 2025-12-04T10:11:57.6993994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.6994057Z raise RuntimeError(msg) 2025-12-04T10:11:57.6994872Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.6994878Z 2025-12-04T10:11:57.6995006Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.6995523Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.6995567Z 2025-12-04T10:11:57.6995721Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.6995844Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6995936Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6996476Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6996606Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6996663Z graph_break [] 2025-12-04T10:11:57.6996787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6996877Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6996998Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6997529Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6997589Z graph_break [] 2025-12-04T10:11:57.6997709Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.6997798Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.6997918Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.6998514Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.6998605Z graph_break [] 2025-12-04T10:11:57.6999109Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml - 2025-12-04T10:11:57.6999218Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7000602Z FAILED [0.5439s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7000608Z 2025-12-04T10:11:57.7000741Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7001257Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7001263Z 2025-12-04T10:11:57.7001416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7001523Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7001635Z ================== 1 failed, 27 deselected, 2 rerun in 3.05s =================== 2025-12-04T10:11:57.7001695Z Got exit code 1 2025-12-04T10:11:57.7001761Z Retrying single test... 2025-12-04T10:11:57.7002026Z W1204 09:47:58.818000 49602 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7002413Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml 2025-12-04T10:11:57.7002510Z ============================= test session starts ============================== 2025-12-04T10:11:57.7002777Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7002845Z cachedir: .pytest_cache 2025-12-04T10:11:57.7003151Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7003230Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7003293Z configfile: pytest.ini 2025-12-04T10:11:57.7003608Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7003745Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7004309Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7004384Z Running 1 items in this shard 2025-12-04T10:11:57.7004388Z 2025-12-04T10:11:57.7005111Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:00.415014067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7005116Z 2025-12-04T10:11:57.7005417Z [W1204 09:48:09.482255486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7005420Z 2025-12-04T10:11:57.7005798Z [W1204 09:48:09.482524060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7005803Z 2025-12-04T10:11:57.7006090Z [W1204 09:48:09.488645096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7006128Z 2025-12-04T10:11:57.7006421Z [W1204 09:48:09.489229386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7006424Z 2025-12-04T10:11:57.7006713Z [W1204 09:48:09.489402869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7006716Z 2025-12-04T10:11:57.7007006Z [W1204 09:48:09.494927064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7007010Z 2025-12-04T10:11:57.7007297Z [W1204 09:48:09.495468174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7007300Z 2025-12-04T10:11:57.7007594Z [W1204 09:48:09.495629156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7007599Z 2025-12-04T10:11:57.7007682Z ('RERUN', {'yellow': True}) [11.0182s] [100%] 2025-12-04T10:11:57.7008407Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:10.295454427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7008411Z 2025-12-04T10:11:57.7008700Z [W1204 09:48:10.296000186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7008704Z 2025-12-04T10:11:57.7009001Z [W1204 09:48:10.296138049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7009008Z 2025-12-04T10:11:57.7009296Z [W1204 09:48:10.299062429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7009336Z 2025-12-04T10:11:57.7009626Z [W1204 09:48:10.299509557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7009629Z 2025-12-04T10:11:57.7009920Z [W1204 09:48:10.299644759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7009924Z 2025-12-04T10:11:57.7010209Z [W1204 09:48:10.304161947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7010213Z 2025-12-04T10:11:57.7010504Z [W1204 09:48:10.304628815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7010508Z 2025-12-04T10:11:57.7010794Z [W1204 09:48:10.304763547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7010799Z 2025-12-04T10:11:57.7010882Z ('RERUN', {'yellow': True}) [0.4995s] [100%] 2025-12-04T10:11:57.7011595Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:10.794096739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7011599Z 2025-12-04T10:11:57.7011898Z [W1204 09:48:10.794637819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7011902Z 2025-12-04T10:11:57.7012256Z [W1204 09:48:10.794775551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7012260Z 2025-12-04T10:11:57.7012548Z [W1204 09:48:10.797719781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7012589Z 2025-12-04T10:11:57.7012876Z [W1204 09:48:10.798167049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7012878Z 2025-12-04T10:11:57.7013168Z [W1204 09:48:10.798301771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7013171Z 2025-12-04T10:11:57.7013475Z [W1204 09:48:10.802953912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7013478Z 2025-12-04T10:11:57.7013766Z [W1204 09:48:10.803413050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7013769Z 2025-12-04T10:11:57.7014061Z [W1204 09:48:10.803550042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7014064Z 2025-12-04T10:11:57.7014125Z FAILED [0.4978s] [100%] 2025-12-04T10:11:57.7014130Z 2025-12-04T10:11:57.7014217Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7014511Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7014586Z Traceback (most recent call last): 2025-12-04T10:11:57.7014897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7014966Z method(*args, **kwargs) 2025-12-04T10:11:57.7015260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7015327Z method(*args, **kwargs) 2025-12-04T10:11:57.7015612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7015676Z with policy(): 2025-12-04T10:11:57.7015971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7016076Z raise RuntimeError(msg) 2025-12-04T10:11:57.7016880Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7016886Z 2025-12-04T10:11:57.7017416Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7017958Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7017963Z 2025-12-04T10:11:57.7018129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7018263Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7018374Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7018930Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7019065Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7019125Z graph_break [] 2025-12-04T10:11:57.7019379Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7020097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7020219Z if out == self.unknown_value: 2025-12-04T10:11:57.7020520Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7020594Z Traceback (most recent call last): 2025-12-04T10:11:57.7020900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7020969Z method(*args, **kwargs) 2025-12-04T10:11:57.7021259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7021324Z method(*args, **kwargs) 2025-12-04T10:11:57.7021616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7021675Z with policy(): 2025-12-04T10:11:57.7021969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7022037Z raise RuntimeError(msg) 2025-12-04T10:11:57.7022845Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7022854Z 2025-12-04T10:11:57.7022984Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7023506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7023510Z 2025-12-04T10:11:57.7023674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7023801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7023955Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7024505Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7024631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7024693Z graph_break [] 2025-12-04T10:11:57.7024814Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7025511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7025581Z if out == self.unknown_value: 2025-12-04T10:11:57.7025704Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7025800Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7025934Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7026477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7026541Z graph_break [] 2025-12-04T10:11:57.7026624Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7026988Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7027062Z Traceback (most recent call last): 2025-12-04T10:11:57.7027362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7027465Z method(*args, **kwargs) 2025-12-04T10:11:57.7027755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7027821Z method(*args, **kwargs) 2025-12-04T10:11:57.7028110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7028168Z with policy(): 2025-12-04T10:11:57.7028464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7028529Z raise RuntimeError(msg) 2025-12-04T10:11:57.7029333Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7029344Z 2025-12-04T10:11:57.7029473Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7029990Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7029994Z 2025-12-04T10:11:57.7030153Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7030278Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7030369Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7030913Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7031037Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7031154Z graph_break [] 2025-12-04T10:11:57.7031276Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7031969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7032044Z if out == self.unknown_value: 2025-12-04T10:11:57.7032166Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7032262Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7032382Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7032919Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7032984Z graph_break [] 2025-12-04T10:11:57.7033104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7033195Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7033315Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7033850Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7033979Z graph_break [] 2025-12-04T10:11:57.7034468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml - 2025-12-04T10:11:57.7034569Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7035877Z FAILED [0.4978s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7035883Z 2025-12-04T10:11:57.7036011Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7036530Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7036533Z 2025-12-04T10:11:57.7036687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7036800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7036915Z ================== 1 failed, 57 deselected, 2 rerun in 12.04s ================== 2025-12-04T10:11:57.7036975Z Got exit code 1 2025-12-04T10:11:57.7037038Z Retrying single test... 2025-12-04T10:11:57.7037303Z W1204 09:48:17.434000 49789 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7037695Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml 2025-12-04T10:11:57.7037792Z ============================= test session starts ============================== 2025-12-04T10:11:57.7038015Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7038080Z cachedir: .pytest_cache 2025-12-04T10:11:57.7038391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7038512Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7038578Z configfile: pytest.ini 2025-12-04T10:11:57.7038891Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7039024Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7039590Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7039664Z Running 1 items in this shard 2025-12-04T10:11:57.7039667Z 2025-12-04T10:11:57.7040434Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:19.031560224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7040443Z 2025-12-04T10:11:57.7040746Z [W1204 09:48:28.272828993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7040750Z 2025-12-04T10:11:57.7041039Z [W1204 09:48:28.273076717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7041042Z 2025-12-04T10:11:57.7041398Z [W1204 09:48:28.279038520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7041409Z 2025-12-04T10:11:57.7041709Z [W1204 09:48:28.279580709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7041712Z 2025-12-04T10:11:57.7042032Z [W1204 09:48:28.279757852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7042037Z 2025-12-04T10:11:57.7042330Z [W1204 09:48:28.285274146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7042334Z 2025-12-04T10:11:57.7042621Z [W1204 09:48:28.285806115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7042624Z 2025-12-04T10:11:57.7042913Z [W1204 09:48:28.285972608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7042919Z 2025-12-04T10:11:57.7043000Z ('RERUN', {'yellow': True}) [11.1960s] [100%] 2025-12-04T10:11:57.7043726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:29.089755750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7043732Z 2025-12-04T10:11:57.7044022Z [W1204 09:48:29.090337090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7044025Z 2025-12-04T10:11:57.7044319Z [W1204 09:48:29.090486943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7044323Z 2025-12-04T10:11:57.7044612Z [W1204 09:48:29.093535965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7044618Z 2025-12-04T10:11:57.7044901Z [W1204 09:48:29.093996173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7044907Z 2025-12-04T10:11:57.7045195Z [W1204 09:48:29.094134735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7045237Z 2025-12-04T10:11:57.7045522Z [W1204 09:48:29.098764844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7045525Z 2025-12-04T10:11:57.7045815Z [W1204 09:48:29.099225862 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7045818Z 2025-12-04T10:11:57.7046102Z [W1204 09:48:29.099361704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7046105Z 2025-12-04T10:11:57.7046190Z ('RERUN', {'yellow': True}) [0.4982s] [100%] 2025-12-04T10:11:57.7046905Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:29.585067686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7046911Z 2025-12-04T10:11:57.7047206Z [W1204 09:48:29.585629596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7047208Z 2025-12-04T10:11:57.7047494Z [W1204 09:48:29.585772148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7047496Z 2025-12-04T10:11:57.7047781Z [W1204 09:48:29.588770630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7047855Z 2025-12-04T10:11:57.7048143Z [W1204 09:48:29.589235468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7048146Z 2025-12-04T10:11:57.7048433Z [W1204 09:48:29.589373411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7048472Z 2025-12-04T10:11:57.7048760Z [W1204 09:48:29.594058680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7048763Z 2025-12-04T10:11:57.7049054Z [W1204 09:48:29.594533509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7049057Z 2025-12-04T10:11:57.7049347Z [W1204 09:48:29.594671611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7049351Z 2025-12-04T10:11:57.7049414Z FAILED [0.4950s] [100%] 2025-12-04T10:11:57.7049417Z 2025-12-04T10:11:57.7049502Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7049792Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7049868Z Traceback (most recent call last): 2025-12-04T10:11:57.7050180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7050245Z method(*args, **kwargs) 2025-12-04T10:11:57.7050538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7050603Z method(*args, **kwargs) 2025-12-04T10:11:57.7050891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7050953Z with policy(): 2025-12-04T10:11:57.7051247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7051324Z raise RuntimeError(msg) 2025-12-04T10:11:57.7052125Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7052169Z 2025-12-04T10:11:57.7052300Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7052819Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7052823Z 2025-12-04T10:11:57.7052986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7053118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7053211Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7053763Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7053892Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7053954Z graph_break [] 2025-12-04T10:11:57.7054076Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7054768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7054901Z if out == self.unknown_value: 2025-12-04T10:11:57.7055201Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7055279Z Traceback (most recent call last): 2025-12-04T10:11:57.7055628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7055697Z method(*args, **kwargs) 2025-12-04T10:11:57.7055983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7056043Z method(*args, **kwargs) 2025-12-04T10:11:57.7056336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7056395Z with policy(): 2025-12-04T10:11:57.7056689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7056759Z raise RuntimeError(msg) 2025-12-04T10:11:57.7057555Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7057562Z 2025-12-04T10:11:57.7057691Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7058206Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7058210Z 2025-12-04T10:11:57.7058369Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7058493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7058584Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7059129Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7059293Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7059354Z graph_break [] 2025-12-04T10:11:57.7059476Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7060159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7060231Z if out == self.unknown_value: 2025-12-04T10:11:57.7060353Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7060449Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7060578Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7061118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7061180Z graph_break [] 2025-12-04T10:11:57.7061269Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7061557Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7061635Z Traceback (most recent call last): 2025-12-04T10:11:57.7062080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7062148Z method(*args, **kwargs) 2025-12-04T10:11:57.7062437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7062498Z method(*args, **kwargs) 2025-12-04T10:11:57.7062828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7062887Z with policy(): 2025-12-04T10:11:57.7063184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7063252Z raise RuntimeError(msg) 2025-12-04T10:11:57.7064067Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7064071Z 2025-12-04T10:11:57.7064210Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7064726Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7064733Z 2025-12-04T10:11:57.7064895Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7065021Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7065115Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7065663Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7065792Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7065851Z graph_break [] 2025-12-04T10:11:57.7065974Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7066657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7066771Z if out == self.unknown_value: 2025-12-04T10:11:57.7066890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7066982Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7067104Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7067649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7067711Z graph_break [] 2025-12-04T10:11:57.7067833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7067922Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7068047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7068581Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7068641Z graph_break [] 2025-12-04T10:11:57.7069127Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml - 2025-12-04T10:11:57.7069295Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7070570Z FAILED [0.4950s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7070608Z 2025-12-04T10:11:57.7070737Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7071257Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7071261Z 2025-12-04T10:11:57.7071420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7071527Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7071642Z ================== 1 failed, 57 deselected, 2 rerun in 12.21s ================== 2025-12-04T10:11:57.7071704Z Got exit code 1 2025-12-04T10:11:57.7072174Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7072416Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7072682Z W1204 09:48:36.292000 49976 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7073068Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml 2025-12-04T10:11:57.7073168Z ============================= test session starts ============================== 2025-12-04T10:11:57.7073377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7073444Z cachedir: .pytest_cache 2025-12-04T10:11:57.7073754Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7073869Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7073934Z configfile: pytest.ini 2025-12-04T10:11:57.7074253Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7074381Z collecting ... collected 58 items / 28 deselected / 30 selected 2025-12-04T10:11:57.7074469Z stepcurrent: skipping 28 already run items. 2025-12-04T10:11:57.7074540Z Running 30 items in this shard 2025-12-04T10:11:57.7074544Z 2025-12-04T10:11:57.7075048Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8873s] [ 3%] 2025-12-04T10:11:57.7075539Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4851s] [ 3%] 2025-12-04T10:11:57.7075982Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4820s] [ 3%] 2025-12-04T10:11:57.7075986Z 2025-12-04T10:11:57.7076074Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7076364Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7076437Z Traceback (most recent call last): 2025-12-04T10:11:57.7076814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7076882Z method(*args, **kwargs) 2025-12-04T10:11:57.7077175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7077272Z method(*args, **kwargs) 2025-12-04T10:11:57.7077559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7077629Z with policy(): 2025-12-04T10:11:57.7077928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7077994Z raise RuntimeError(msg) 2025-12-04T10:11:57.7078807Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7078811Z 2025-12-04T10:11:57.7078936Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7079477Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7079483Z 2025-12-04T10:11:57.7079644Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7079780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7079912Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7080268Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7080398Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7080457Z graph_break [] 2025-12-04T10:11:57.7080748Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7080868Z Traceback (most recent call last): 2025-12-04T10:11:57.7081172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7081237Z method(*args, **kwargs) 2025-12-04T10:11:57.7081526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7081589Z method(*args, **kwargs) 2025-12-04T10:11:57.7081883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7081942Z with policy(): 2025-12-04T10:11:57.7082237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7082306Z raise RuntimeError(msg) 2025-12-04T10:11:57.7083130Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7083136Z 2025-12-04T10:11:57.7083266Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7083784Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7083788Z 2025-12-04T10:11:57.7084030Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7084160Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7084252Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7084601Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7084775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7084837Z graph_break [] 2025-12-04T10:11:57.7084963Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7085051Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7085173Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7085515Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7085573Z graph_break [] 2025-12-04T10:11:57.7085662Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7085951Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7086029Z Traceback (most recent call last): 2025-12-04T10:11:57.7086321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7086382Z method(*args, **kwargs) 2025-12-04T10:11:57.7086675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7086736Z method(*args, **kwargs) 2025-12-04T10:11:57.7087020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7087081Z with policy(): 2025-12-04T10:11:57.7087377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7087444Z raise RuntimeError(msg) 2025-12-04T10:11:57.7088261Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7088305Z 2025-12-04T10:11:57.7088438Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7088955Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7088959Z 2025-12-04T10:11:57.7089115Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7089241Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7089330Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7089673Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7089799Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7089856Z graph_break [] 2025-12-04T10:11:57.7089981Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7090068Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7090187Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7090595Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7090653Z graph_break [] 2025-12-04T10:11:57.7090777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7090864Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7091015Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7091358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7091415Z graph_break [] 2025-12-04T10:11:57.7091905Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml - 2025-12-04T10:11:57.7092007Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7093297Z FAILED [0.4820s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7093307Z 2025-12-04T10:11:57.7093430Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7093948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7093952Z 2025-12-04T10:11:57.7094109Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7094216Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7094338Z ================== 1 failed, 28 deselected, 2 rerun in 2.88s =================== 2025-12-04T10:11:57.7094397Z Got exit code 1 2025-12-04T10:11:57.7094461Z Retrying single test... 2025-12-04T10:11:57.7094728Z W1204 09:48:45.897000 50164 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7095152Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml 2025-12-04T10:11:57.7095245Z ============================= test session starts ============================== 2025-12-04T10:11:57.7095464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7095530Z cachedir: .pytest_cache 2025-12-04T10:11:57.7095839Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7095917Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7095981Z configfile: pytest.ini 2025-12-04T10:11:57.7096300Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7096431Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7097003Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7097076Z Running 1 items in this shard 2025-12-04T10:11:57.7097080Z 2025-12-04T10:11:57.7097874Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:48:47.996320152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7097879Z 2025-12-04T10:11:57.7098183Z [W1204 09:48:56.146412430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7098186Z 2025-12-04T10:11:57.7098510Z [W1204 09:48:56.146673974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7098515Z 2025-12-04T10:11:57.7098808Z [W1204 09:48:56.152644417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7098811Z 2025-12-04T10:11:57.7099097Z [W1204 09:48:56.153235327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7099100Z 2025-12-04T10:11:57.7099387Z [W1204 09:48:56.153411510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7099393Z 2025-12-04T10:11:57.7099677Z [W1204 09:48:56.159002856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7099680Z 2025-12-04T10:11:57.7099965Z [W1204 09:48:56.159534675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7099970Z 2025-12-04T10:11:57.7100255Z [W1204 09:48:56.159694707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7100259Z 2025-12-04T10:11:57.7100342Z ('RERUN', {'yellow': True}) [11.0563s] [100%] 2025-12-04T10:11:57.7101071Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:48:57.373022273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7101077Z 2025-12-04T10:11:57.7101365Z [W1204 09:48:57.373548973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7101369Z 2025-12-04T10:11:57.7101654Z [W1204 09:48:57.373690215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7101696Z 2025-12-04T10:11:57.7101982Z [W1204 09:48:57.376679706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7101985Z 2025-12-04T10:11:57.7102276Z [W1204 09:48:57.377243446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7102280Z 2025-12-04T10:11:57.7102565Z [W1204 09:48:57.377381748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7102569Z 2025-12-04T10:11:57.7102859Z [W1204 09:48:57.381967147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7102862Z 2025-12-04T10:11:57.7103145Z [W1204 09:48:57.382439665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7103152Z 2025-12-04T10:11:57.7103438Z [W1204 09:48:57.382576277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7103441Z 2025-12-04T10:11:57.7103519Z ('RERUN', {'yellow': True}) [0.4566s] [100%] 2025-12-04T10:11:57.7104239Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:48:57.827120167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7104243Z 2025-12-04T10:11:57.7104601Z [W1204 09:48:57.827653116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7104605Z 2025-12-04T10:11:57.7104892Z [W1204 09:48:57.827793688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7104931Z 2025-12-04T10:11:57.7105228Z [W1204 09:48:57.830770069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7105231Z 2025-12-04T10:11:57.7105519Z [W1204 09:48:57.831333039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7105522Z 2025-12-04T10:11:57.7105812Z [W1204 09:48:57.831473191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7105816Z 2025-12-04T10:11:57.7106103Z [W1204 09:48:57.836037260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7106107Z 2025-12-04T10:11:57.7106392Z [W1204 09:48:57.836517458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7106398Z 2025-12-04T10:11:57.7106682Z [W1204 09:48:57.836654700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7106685Z 2025-12-04T10:11:57.7106747Z FAILED [0.4527s] [100%] 2025-12-04T10:11:57.7106750Z 2025-12-04T10:11:57.7106839Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7107136Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7107215Z Traceback (most recent call last): 2025-12-04T10:11:57.7107534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7107600Z method(*args, **kwargs) 2025-12-04T10:11:57.7107896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7108001Z method(*args, **kwargs) 2025-12-04T10:11:57.7108291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7108350Z with policy(): 2025-12-04T10:11:57.7108644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7108710Z raise RuntimeError(msg) 2025-12-04T10:11:57.7109520Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7109524Z 2025-12-04T10:11:57.7109654Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7110175Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7110181Z 2025-12-04T10:11:57.7110337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7110468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7110566Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7110918Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7111129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7111190Z graph_break [] 2025-12-04T10:11:57.7111318Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7112043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7112121Z if out == self.unknown_value: 2025-12-04T10:11:57.7112412Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7112483Z Traceback (most recent call last): 2025-12-04T10:11:57.7112789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7112856Z method(*args, **kwargs) 2025-12-04T10:11:57.7113151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7113216Z method(*args, **kwargs) 2025-12-04T10:11:57.7113502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7113568Z with policy(): 2025-12-04T10:11:57.7113860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7113923Z raise RuntimeError(msg) 2025-12-04T10:11:57.7114741Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7114745Z 2025-12-04T10:11:57.7114874Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7115393Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7115436Z 2025-12-04T10:11:57.7115597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7115720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7115817Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7116164Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7116293Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7116351Z graph_break [] 2025-12-04T10:11:57.7116477Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7117297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7117374Z if out == self.unknown_value: 2025-12-04T10:11:57.7117510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7117602Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7117726Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7118070Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7118128Z graph_break [] 2025-12-04T10:11:57.7118321Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7118621Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7118697Z Traceback (most recent call last): 2025-12-04T10:11:57.7119055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7119122Z method(*args, **kwargs) 2025-12-04T10:11:57.7119412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7119477Z method(*args, **kwargs) 2025-12-04T10:11:57.7119765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7119825Z with policy(): 2025-12-04T10:11:57.7120161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7120226Z raise RuntimeError(msg) 2025-12-04T10:11:57.7121044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7121052Z 2025-12-04T10:11:57.7121176Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7121715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7121719Z 2025-12-04T10:11:57.7121877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7122003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7122096Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7122438Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7122623Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7122681Z graph_break [] 2025-12-04T10:11:57.7122804Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7123488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7123557Z if out == self.unknown_value: 2025-12-04T10:11:57.7123679Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7123771Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7123892Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7124233Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7124295Z graph_break [] 2025-12-04T10:11:57.7124415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7124505Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7124623Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7124965Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7125022Z graph_break [] 2025-12-04T10:11:57.7125578Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml - 2025-12-04T10:11:57.7125684Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7126979Z FAILED [0.4527s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7127019Z 2025-12-04T10:11:57.7127149Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7127668Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7127672Z 2025-12-04T10:11:57.7127832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7127938Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7128054Z ================== 1 failed, 57 deselected, 2 rerun in 11.99s ================== 2025-12-04T10:11:57.7128117Z Got exit code 1 2025-12-04T10:11:57.7128180Z Retrying single test... 2025-12-04T10:11:57.7128445Z W1204 09:49:04.466000 50357 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7128833Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml 2025-12-04T10:11:57.7128926Z ============================= test session starts ============================== 2025-12-04T10:11:57.7129146Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7129212Z cachedir: .pytest_cache 2025-12-04T10:11:57.7129518Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7129638Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7129701Z configfile: pytest.ini 2025-12-04T10:11:57.7130015Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7130143Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7130713Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7130787Z Running 1 items in this shard 2025-12-04T10:11:57.7130793Z 2025-12-04T10:11:57.7131521Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:49:05.577562747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7131528Z 2025-12-04T10:11:57.7131828Z [W1204 09:49:14.680533992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7131832Z 2025-12-04T10:11:57.7132121Z [W1204 09:49:14.680781566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7132124Z 2025-12-04T10:11:57.7132418Z [W1204 09:49:14.686594506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7132421Z 2025-12-04T10:11:57.7132774Z [W1204 09:49:14.687180466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7132778Z 2025-12-04T10:11:57.7133071Z [W1204 09:49:14.687350149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7133110Z 2025-12-04T10:11:57.7133400Z [W1204 09:49:14.692941135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7133403Z 2025-12-04T10:11:57.7133689Z [W1204 09:49:14.693475344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7133692Z 2025-12-04T10:11:57.7133984Z [W1204 09:49:14.693637397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7133987Z 2025-12-04T10:11:57.7134069Z ('RERUN', {'yellow': True}) [11.0229s] [100%] 2025-12-04T10:11:57.7134797Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:49:15.909138679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7134804Z 2025-12-04T10:11:57.7135091Z [W1204 09:49:15.909670218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7135095Z 2025-12-04T10:11:57.7135382Z [W1204 09:49:15.909811320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7135385Z 2025-12-04T10:11:57.7135671Z [W1204 09:49:15.912828062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7135674Z 2025-12-04T10:11:57.7135964Z [W1204 09:49:15.913400282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7135968Z 2025-12-04T10:11:57.7136253Z [W1204 09:49:15.913539715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7136311Z 2025-12-04T10:11:57.7136597Z [W1204 09:49:15.918176324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7136604Z 2025-12-04T10:11:57.7136890Z [W1204 09:49:15.918644752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7136893Z 2025-12-04T10:11:57.7137180Z [W1204 09:49:15.918782205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7137183Z 2025-12-04T10:11:57.7137263Z ('RERUN', {'yellow': True}) [0.4531s] [100%] 2025-12-04T10:11:57.7137985Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:49:16.360601137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7137991Z 2025-12-04T10:11:57.7138282Z [W1204 09:49:16.361123296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7138285Z 2025-12-04T10:11:57.7138571Z [W1204 09:49:16.361260898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7138575Z 2025-12-04T10:11:57.7138861Z [W1204 09:49:16.364125508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7138864Z 2025-12-04T10:11:57.7139216Z [W1204 09:49:16.364687737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7139219Z 2025-12-04T10:11:57.7139518Z [W1204 09:49:16.364824990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7139554Z 2025-12-04T10:11:57.7139845Z [W1204 09:49:16.369276246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7139848Z 2025-12-04T10:11:57.7140138Z [W1204 09:49:16.369728024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7140143Z 2025-12-04T10:11:57.7140427Z [W1204 09:49:16.369861857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7140430Z 2025-12-04T10:11:57.7140490Z FAILED [0.4506s] [100%] 2025-12-04T10:11:57.7140494Z 2025-12-04T10:11:57.7140584Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7140877Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7140950Z Traceback (most recent call last): 2025-12-04T10:11:57.7141259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7141322Z method(*args, **kwargs) 2025-12-04T10:11:57.7141615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7141677Z method(*args, **kwargs) 2025-12-04T10:11:57.7141962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7142023Z with policy(): 2025-12-04T10:11:57.7142318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7142390Z raise RuntimeError(msg) 2025-12-04T10:11:57.7143197Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7143239Z 2025-12-04T10:11:57.7143368Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7143891Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7143894Z 2025-12-04T10:11:57.7144051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7144185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7144279Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7144625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7144759Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7144829Z graph_break [] 2025-12-04T10:11:57.7144959Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7145652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7145722Z if out == self.unknown_value: 2025-12-04T10:11:57.7146337Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7146417Z Traceback (most recent call last): 2025-12-04T10:11:57.7146711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7146809Z method(*args, **kwargs) 2025-12-04T10:11:57.7147098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7147163Z method(*args, **kwargs) 2025-12-04T10:11:57.7147449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7147508Z with policy(): 2025-12-04T10:11:57.7147801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7147866Z raise RuntimeError(msg) 2025-12-04T10:11:57.7148686Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7148692Z 2025-12-04T10:11:57.7148818Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7149341Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7149344Z 2025-12-04T10:11:57.7149498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7149621Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7149718Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7150062Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7150188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7150248Z graph_break [] 2025-12-04T10:11:57.7150408Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7151106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7151173Z if out == self.unknown_value: 2025-12-04T10:11:57.7151294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7151386Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7151510Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7151851Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7151909Z graph_break [] 2025-12-04T10:11:57.7151992Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7152298Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7152370Z Traceback (most recent call last): 2025-12-04T10:11:57.7152665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7152730Z method(*args, **kwargs) 2025-12-04T10:11:57.7153017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7153082Z method(*args, **kwargs) 2025-12-04T10:11:57.7153433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7153493Z with policy(): 2025-12-04T10:11:57.7153791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7153893Z raise RuntimeError(msg) 2025-12-04T10:11:57.7154714Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7154718Z 2025-12-04T10:11:57.7154843Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7155364Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7155371Z 2025-12-04T10:11:57.7155527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7155650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7155745Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7156085Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7156208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7156271Z graph_break [] 2025-12-04T10:11:57.7156393Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7157085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7157153Z if out == self.unknown_value: 2025-12-04T10:11:57.7157274Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7157421Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7157543Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7157885Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7157944Z graph_break [] 2025-12-04T10:11:57.7158065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7158157Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7158276Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7158617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7158678Z graph_break [] 2025-12-04T10:11:57.7159164Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml - 2025-12-04T10:11:57.7159270Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7160671Z FAILED [0.4506s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7160676Z 2025-12-04T10:11:57.7160810Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7161330Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7161392Z 2025-12-04T10:11:57.7161551Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7161671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7161791Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.7161852Z Got exit code 1 2025-12-04T10:11:57.7162332Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7162575Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7162842Z W1204 09:49:23.039000 50550 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7163230Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml 2025-12-04T10:11:57.7163329Z ============================= test session starts ============================== 2025-12-04T10:11:57.7163536Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7163602Z cachedir: .pytest_cache 2025-12-04T10:11:57.7163905Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7163984Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7164051Z configfile: pytest.ini 2025-12-04T10:11:57.7164364Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7164490Z collecting ... collected 58 items / 29 deselected / 29 selected 2025-12-04T10:11:57.7164620Z stepcurrent: skipping 29 already run items. 2025-12-04T10:11:57.7164689Z Running 29 items in this shard 2025-12-04T10:11:57.7164693Z 2025-12-04T10:11:57.7165188Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8542s] [ 3%] 2025-12-04T10:11:57.7165672Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4390s] [ 3%] 2025-12-04T10:11:57.7166119Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4334s] [ 3%] 2025-12-04T10:11:57.7166123Z 2025-12-04T10:11:57.7166207Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7166493Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7166569Z Traceback (most recent call last): 2025-12-04T10:11:57.7166875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7166939Z method(*args, **kwargs) 2025-12-04T10:11:57.7167235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7167297Z method(*args, **kwargs) 2025-12-04T10:11:57.7167650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7167716Z with policy(): 2025-12-04T10:11:57.7168011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7168079Z raise RuntimeError(msg) 2025-12-04T10:11:57.7168905Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7168911Z 2025-12-04T10:11:57.7169036Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7169555Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7169559Z 2025-12-04T10:11:57.7169728Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7169861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7169953Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7170303Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7170433Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7170491Z graph_break [] 2025-12-04T10:11:57.7170779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7170852Z Traceback (most recent call last): 2025-12-04T10:11:57.7171144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7171210Z method(*args, **kwargs) 2025-12-04T10:11:57.7171499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7171566Z method(*args, **kwargs) 2025-12-04T10:11:57.7171856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7171952Z with policy(): 2025-12-04T10:11:57.7172253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7172319Z raise RuntimeError(msg) 2025-12-04T10:11:57.7173120Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7173129Z 2025-12-04T10:11:57.7173254Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7173766Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7173773Z 2025-12-04T10:11:57.7173928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7174052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7174147Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7174486Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7174612Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7174673Z graph_break [] 2025-12-04T10:11:57.7174942Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7175032Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7175157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7175542Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7175604Z graph_break [] 2025-12-04T10:11:57.7175688Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7175970Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7176045Z Traceback (most recent call last): 2025-12-04T10:11:57.7176338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7176404Z method(*args, **kwargs) 2025-12-04T10:11:57.7176696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7176757Z method(*args, **kwargs) 2025-12-04T10:11:57.7177050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7177111Z with policy(): 2025-12-04T10:11:57.7177400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7177469Z raise RuntimeError(msg) 2025-12-04T10:11:57.7178275Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7178279Z 2025-12-04T10:11:57.7178405Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7178918Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7178959Z 2025-12-04T10:11:57.7179116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7179250Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7179342Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7179688Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7179810Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7179870Z graph_break [] 2025-12-04T10:11:57.7179998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7180083Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7180205Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7180545Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7180601Z graph_break [] 2025-12-04T10:11:57.7180726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7180818Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7180946Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7181358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7181419Z graph_break [] 2025-12-04T10:11:57.7181906Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml - 2025-12-04T10:11:57.7182056Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7183335Z FAILED [0.4334s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7183340Z 2025-12-04T10:11:57.7183471Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7183991Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7184001Z 2025-12-04T10:11:57.7184160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7184267Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7184389Z ================== 1 failed, 29 deselected, 2 rerun in 2.75s =================== 2025-12-04T10:11:57.7184447Z Got exit code 1 2025-12-04T10:11:57.7184512Z Retrying single test... 2025-12-04T10:11:57.7184779Z W1204 09:49:32.687000 50731 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7185175Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml 2025-12-04T10:11:57.7185278Z ============================= test session starts ============================== 2025-12-04T10:11:57.7185488Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7185551Z cachedir: .pytest_cache 2025-12-04T10:11:57.7185905Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7185981Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7186047Z configfile: pytest.ini 2025-12-04T10:11:57.7186367Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7186494Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7187070Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7187142Z Running 1 items in this shard 2025-12-04T10:11:57.7187145Z 2025-12-04T10:11:57.7187867Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:33.729351825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7187878Z 2025-12-04T10:11:57.7188180Z [W1204 09:49:42.741671494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7188183Z 2025-12-04T10:11:57.7188473Z [W1204 09:49:42.741926409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7188479Z 2025-12-04T10:11:57.7188847Z [W1204 09:49:42.747831789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7188851Z 2025-12-04T10:11:57.7189140Z [W1204 09:49:42.748423620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7189177Z 2025-12-04T10:11:57.7189469Z [W1204 09:49:42.748590933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7189474Z 2025-12-04T10:11:57.7189758Z [W1204 09:49:42.754175029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7189761Z 2025-12-04T10:11:57.7190050Z [W1204 09:49:42.754712308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7190053Z 2025-12-04T10:11:57.7190345Z [W1204 09:49:42.754880291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7190348Z 2025-12-04T10:11:57.7190432Z ('RERUN', {'yellow': True}) [10.8619s] [100%] 2025-12-04T10:11:57.7191153Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:43.929774104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7191160Z 2025-12-04T10:11:57.7191447Z [W1204 09:49:43.930352243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7191454Z 2025-12-04T10:11:57.7191740Z [W1204 09:49:43.930496516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7191743Z 2025-12-04T10:11:57.7192029Z [W1204 09:49:43.933387725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7192032Z 2025-12-04T10:11:57.7192329Z [W1204 09:49:43.933942555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7192332Z 2025-12-04T10:11:57.7192622Z [W1204 09:49:43.934081517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7192661Z 2025-12-04T10:11:57.7192950Z [W1204 09:49:43.938537663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7192953Z 2025-12-04T10:11:57.7193237Z [W1204 09:49:43.938993931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7193241Z 2025-12-04T10:11:57.7193527Z [W1204 09:49:43.939129893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7193533Z 2025-12-04T10:11:57.7193612Z ('RERUN', {'yellow': True}) [0.4178s] [100%] 2025-12-04T10:11:57.7194331Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:44.346606967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7194337Z 2025-12-04T10:11:57.7194626Z [W1204 09:49:44.347157466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7194629Z 2025-12-04T10:11:57.7194914Z [W1204 09:49:44.347295879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7194924Z 2025-12-04T10:11:57.7195213Z [W1204 09:49:44.350202289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7195282Z 2025-12-04T10:11:57.7195569Z [W1204 09:49:44.350759028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7195572Z 2025-12-04T10:11:57.7195864Z [W1204 09:49:44.350896171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7195901Z 2025-12-04T10:11:57.7196186Z [W1204 09:49:44.355343777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7196189Z 2025-12-04T10:11:57.7196480Z [W1204 09:49:44.355799335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7196483Z 2025-12-04T10:11:57.7196770Z [W1204 09:49:44.355935398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7196773Z 2025-12-04T10:11:57.7196840Z FAILED [0.4150s] [100%] 2025-12-04T10:11:57.7196843Z 2025-12-04T10:11:57.7196926Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7197216Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7197297Z Traceback (most recent call last): 2025-12-04T10:11:57.7197608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7197675Z method(*args, **kwargs) 2025-12-04T10:11:57.7197967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7198031Z method(*args, **kwargs) 2025-12-04T10:11:57.7198333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7198399Z with policy(): 2025-12-04T10:11:57.7198696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7198764Z raise RuntimeError(msg) 2025-12-04T10:11:57.7199556Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7199599Z 2025-12-04T10:11:57.7199736Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7200292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7200296Z 2025-12-04T10:11:57.7200461Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7200588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7200683Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7201035Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7201166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7201224Z graph_break [] 2025-12-04T10:11:57.7201352Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7202041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7202186Z if out == self.unknown_value: 2025-12-04T10:11:57.7202479Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7202552Z Traceback (most recent call last): 2025-12-04T10:11:57.7202853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7202954Z method(*args, **kwargs) 2025-12-04T10:11:57.7203250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7203318Z method(*args, **kwargs) 2025-12-04T10:11:57.7203613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7203676Z with policy(): 2025-12-04T10:11:57.7203972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7204040Z raise RuntimeError(msg) 2025-12-04T10:11:57.7204846Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7204853Z 2025-12-04T10:11:57.7204977Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7205495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7205499Z 2025-12-04T10:11:57.7205663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7205794Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7205888Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7206235Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7206368Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7206465Z graph_break [] 2025-12-04T10:11:57.7206589Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7207287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7207359Z if out == self.unknown_value: 2025-12-04T10:11:57.7207483Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7207575Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7207697Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7208041Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7208102Z graph_break [] 2025-12-04T10:11:57.7208189Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7208476Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7208548Z Traceback (most recent call last): 2025-12-04T10:11:57.7208847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7208910Z method(*args, **kwargs) 2025-12-04T10:11:57.7209267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7209330Z method(*args, **kwargs) 2025-12-04T10:11:57.7209628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7209694Z with policy(): 2025-12-04T10:11:57.7210024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7210089Z raise RuntimeError(msg) 2025-12-04T10:11:57.7210893Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7210898Z 2025-12-04T10:11:57.7211019Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7211536Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7211539Z 2025-12-04T10:11:57.7211693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7211822Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7211912Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7212254Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7212379Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7212437Z graph_break [] 2025-12-04T10:11:57.7212559Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7213250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7213318Z if out == self.unknown_value: 2025-12-04T10:11:57.7213493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7213587Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7213710Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7214053Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7214110Z graph_break [] 2025-12-04T10:11:57.7214232Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7214320Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7214443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7214780Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7214838Z graph_break [] 2025-12-04T10:11:57.7215324Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml - 2025-12-04T10:11:57.7215425Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7216786Z FAILED [0.4150s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7216794Z 2025-12-04T10:11:57.7216937Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7217846Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7217851Z 2025-12-04T10:11:57.7218020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7218130Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7218247Z ================== 1 failed, 57 deselected, 2 rerun in 11.72s ================== 2025-12-04T10:11:57.7218308Z Got exit code 1 2025-12-04T10:11:57.7218373Z Retrying single test... 2025-12-04T10:11:57.7218646Z W1204 09:49:51.046000 50917 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7219033Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml 2025-12-04T10:11:57.7219138Z ============================= test session starts ============================== 2025-12-04T10:11:57.7219357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7219422Z cachedir: .pytest_cache 2025-12-04T10:11:57.7219732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7219810Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7219873Z configfile: pytest.ini 2025-12-04T10:11:57.7220192Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7220326Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7220902Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7221051Z Running 1 items in this shard 2025-12-04T10:11:57.7221055Z 2025-12-04T10:11:57.7221781Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:52.102377034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7221786Z 2025-12-04T10:11:57.7222086Z [W1204 09:50:01.190690714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7222090Z 2025-12-04T10:11:57.7222386Z [W1204 09:50:01.190949308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7222389Z 2025-12-04T10:11:57.7222679Z [W1204 09:50:01.196751798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7222685Z 2025-12-04T10:11:57.7222980Z [W1204 09:50:01.197337837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7222983Z 2025-12-04T10:11:57.7223276Z [W1204 09:50:01.197507511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7223279Z 2025-12-04T10:11:57.7223564Z [W1204 09:50:01.202947574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7223567Z 2025-12-04T10:11:57.7223949Z [W1204 09:50:01.203489293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7223953Z 2025-12-04T10:11:57.7224241Z [W1204 09:50:01.203652446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7224289Z 2025-12-04T10:11:57.7224373Z ('RERUN', {'yellow': True}) [10.9501s] [100%] 2025-12-04T10:11:57.7225096Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:50:02.375854989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7225099Z 2025-12-04T10:11:57.7225387Z [W1204 09:50:02.376432368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7225391Z 2025-12-04T10:11:57.7225681Z [W1204 09:50:02.376577431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7225684Z 2025-12-04T10:11:57.7225970Z [W1204 09:50:02.379478830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7225977Z 2025-12-04T10:11:57.7226264Z [W1204 09:50:02.380052110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7226268Z 2025-12-04T10:11:57.7226556Z [W1204 09:50:02.380194963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7226559Z 2025-12-04T10:11:57.7226846Z [W1204 09:50:02.384668970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7226850Z 2025-12-04T10:11:57.7227135Z [W1204 09:50:02.385126448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7227138Z 2025-12-04T10:11:57.7227426Z [W1204 09:50:02.385262060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7227434Z 2025-12-04T10:11:57.7227549Z ('RERUN', {'yellow': True}) [0.4127s] [100%] 2025-12-04T10:11:57.7228258Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:50:02.785795632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7228262Z 2025-12-04T10:11:57.7228564Z [W1204 09:50:02.786347632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7228568Z 2025-12-04T10:11:57.7228858Z [W1204 09:50:02.786490804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7228861Z 2025-12-04T10:11:57.7229151Z [W1204 09:50:02.789372683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7229155Z 2025-12-04T10:11:57.7229445Z [W1204 09:50:02.789935613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7229448Z 2025-12-04T10:11:57.7229736Z [W1204 09:50:02.790098846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7229739Z 2025-12-04T10:11:57.7230025Z [W1204 09:50:02.794551662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7230028Z 2025-12-04T10:11:57.7230384Z [W1204 09:50:02.795010100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7230387Z 2025-12-04T10:11:57.7230675Z [W1204 09:50:02.795145413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7230711Z 2025-12-04T10:11:57.7230773Z FAILED [0.4079s] [100%] 2025-12-04T10:11:57.7230778Z 2025-12-04T10:11:57.7230868Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7231153Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7231230Z Traceback (most recent call last): 2025-12-04T10:11:57.7231537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7231606Z method(*args, **kwargs) 2025-12-04T10:11:57.7231908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7231971Z method(*args, **kwargs) 2025-12-04T10:11:57.7232254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7232319Z with policy(): 2025-12-04T10:11:57.7232614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7232685Z raise RuntimeError(msg) 2025-12-04T10:11:57.7233479Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7233484Z 2025-12-04T10:11:57.7233621Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7234138Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7234142Z 2025-12-04T10:11:57.7234298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7234468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7234565Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7234920Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7235047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7235106Z graph_break [] 2025-12-04T10:11:57.7235234Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7235939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7236013Z if out == self.unknown_value: 2025-12-04T10:11:57.7236307Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7236382Z Traceback (most recent call last): 2025-12-04T10:11:57.7236685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7236749Z method(*args, **kwargs) 2025-12-04T10:11:57.7237041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7237107Z method(*args, **kwargs) 2025-12-04T10:11:57.7237468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7237534Z with policy(): 2025-12-04T10:11:57.7237827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7237925Z raise RuntimeError(msg) 2025-12-04T10:11:57.7238740Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7238744Z 2025-12-04T10:11:57.7238876Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7239397Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7239401Z 2025-12-04T10:11:57.7239561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7239686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7239785Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7240190Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7240323Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7240383Z graph_break [] 2025-12-04T10:11:57.7240506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7241205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7241274Z if out == self.unknown_value: 2025-12-04T10:11:57.7241397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7241487Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7241672Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7242015Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7242073Z graph_break [] 2025-12-04T10:11:57.7242156Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7242452Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7242524Z Traceback (most recent call last): 2025-12-04T10:11:57.7242829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7242892Z method(*args, **kwargs) 2025-12-04T10:11:57.7243183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7243253Z method(*args, **kwargs) 2025-12-04T10:11:57.7243540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7243598Z with policy(): 2025-12-04T10:11:57.7243895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7243960Z raise RuntimeError(msg) 2025-12-04T10:11:57.7244841Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7244845Z 2025-12-04T10:11:57.7244969Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7245518Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7245524Z 2025-12-04T10:11:57.7245682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7245807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7245899Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7246245Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7246377Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7246443Z graph_break [] 2025-12-04T10:11:57.7246568Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7247259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7247330Z if out == self.unknown_value: 2025-12-04T10:11:57.7247451Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7247544Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7247667Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7248016Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7248073Z graph_break [] 2025-12-04T10:11:57.7248195Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7248285Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7248407Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7248790Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7248849Z graph_break [] 2025-12-04T10:11:57.7249332Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml - 2025-12-04T10:11:57.7249437Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7250724Z FAILED [0.4079s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7250731Z 2025-12-04T10:11:57.7250859Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7251371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7251375Z 2025-12-04T10:11:57.7251532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7251704Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7251823Z ================== 1 failed, 57 deselected, 2 rerun in 11.79s ================== 2025-12-04T10:11:57.7251885Z Got exit code 1 2025-12-04T10:11:57.7252358Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7252642Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7252906Z W1204 09:50:09.437000 51103 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7253290Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml 2025-12-04T10:11:57.7253388Z ============================= test session starts ============================== 2025-12-04T10:11:57.7253601Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7253668Z cachedir: .pytest_cache 2025-12-04T10:11:57.7253979Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7254059Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7254127Z configfile: pytest.ini 2025-12-04T10:11:57.7254442Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7254570Z collecting ... collected 58 items / 30 deselected / 28 selected 2025-12-04T10:11:57.7254659Z stepcurrent: skipping 30 already run items. 2025-12-04T10:11:57.7254729Z Running 28 items in this shard 2025-12-04T10:11:57.7254732Z 2025-12-04T10:11:57.7255230Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9267s] [ 3%] 2025-12-04T10:11:57.7255722Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5301s] [ 3%] 2025-12-04T10:11:57.7256163Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5241s] [ 3%] 2025-12-04T10:11:57.7256204Z 2025-12-04T10:11:57.7256291Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7256575Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7256650Z Traceback (most recent call last): 2025-12-04T10:11:57.7256958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7257024Z method(*args, **kwargs) 2025-12-04T10:11:57.7257323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7257385Z method(*args, **kwargs) 2025-12-04T10:11:57.7257682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7257743Z with policy(): 2025-12-04T10:11:57.7258038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7258110Z raise RuntimeError(msg) 2025-12-04T10:11:57.7258905Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7258975Z 2025-12-04T10:11:57.7259110Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7259630Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7259669Z 2025-12-04T10:11:57.7259828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7259958Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7260052Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7260604Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7260735Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7260792Z graph_break [] 2025-12-04T10:11:57.7261088Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7261161Z Traceback (most recent call last): 2025-12-04T10:11:57.7261471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7261539Z method(*args, **kwargs) 2025-12-04T10:11:57.7261835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7261901Z method(*args, **kwargs) 2025-12-04T10:11:57.7262189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7262247Z with policy(): 2025-12-04T10:11:57.7262549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7262615Z raise RuntimeError(msg) 2025-12-04T10:11:57.7263426Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7263468Z 2025-12-04T10:11:57.7263593Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7264105Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7264111Z 2025-12-04T10:11:57.7264268Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7264397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7264492Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7265037Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7265168Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7265226Z graph_break [] 2025-12-04T10:11:57.7265358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7265451Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7265574Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7266181Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7266243Z graph_break [] 2025-12-04T10:11:57.7266325Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7266614Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7266738Z Traceback (most recent call last): 2025-12-04T10:11:57.7267039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7267105Z method(*args, **kwargs) 2025-12-04T10:11:57.7267408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7267471Z method(*args, **kwargs) 2025-12-04T10:11:57.7267767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7267829Z with policy(): 2025-12-04T10:11:57.7268127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7268191Z raise RuntimeError(msg) 2025-12-04T10:11:57.7269001Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7269010Z 2025-12-04T10:11:57.7269134Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7269651Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7269655Z 2025-12-04T10:11:57.7269817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7269942Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7270033Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7270579Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7270751Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7270816Z graph_break [] 2025-12-04T10:11:57.7270938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7271027Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7271152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7271694Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7271754Z graph_break [] 2025-12-04T10:11:57.7271876Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7271965Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7272088Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7272625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7272686Z graph_break [] 2025-12-04T10:11:57.7273243Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml - 2025-12-04T10:11:57.7273344Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7274623Z FAILED [0.5241s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7274661Z 2025-12-04T10:11:57.7274787Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7275312Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7275315Z 2025-12-04T10:11:57.7275468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7275574Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7275697Z ================== 1 failed, 30 deselected, 2 rerun in 3.01s =================== 2025-12-04T10:11:57.7275755Z Got exit code 1 2025-12-04T10:11:57.7275823Z Retrying single test... 2025-12-04T10:11:57.7276083Z W1204 09:50:19.064000 51285 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7276469Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml 2025-12-04T10:11:57.7276567Z ============================= test session starts ============================== 2025-12-04T10:11:57.7276788Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7276859Z cachedir: .pytest_cache 2025-12-04T10:11:57.7277171Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7277248Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7277358Z configfile: pytest.ini 2025-12-04T10:11:57.7277672Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7277805Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7278369Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7278437Z Running 1 items in this shard 2025-12-04T10:11:57.7278441Z 2025-12-04T10:11:57.7279175Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:20.662417415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7279181Z 2025-12-04T10:11:57.7279478Z [W1204 09:50:29.674002233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7279482Z 2025-12-04T10:11:57.7279778Z [W1204 09:50:29.674268687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7279781Z 2025-12-04T10:11:57.7280107Z [W1204 09:50:29.680496844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7280110Z 2025-12-04T10:11:57.7280486Z [W1204 09:50:29.681112204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7280490Z 2025-12-04T10:11:57.7280779Z [W1204 09:50:29.681288057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7280815Z 2025-12-04T10:11:57.7281106Z [W1204 09:50:29.686799522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7281111Z 2025-12-04T10:11:57.7281413Z [W1204 09:50:29.687338041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7281417Z 2025-12-04T10:11:57.7281704Z [W1204 09:50:29.687500894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7281712Z 2025-12-04T10:11:57.7281793Z ('RERUN', {'yellow': True}) [10.9704s] [100%] 2025-12-04T10:11:57.7282517Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:30.492915707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7282523Z 2025-12-04T10:11:57.7282816Z [W1204 09:50:30.493476687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7282821Z 2025-12-04T10:11:57.7283107Z [W1204 09:50:30.493615639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7283111Z 2025-12-04T10:11:57.7283401Z [W1204 09:50:30.496485908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7283404Z 2025-12-04T10:11:57.7283691Z [W1204 09:50:30.496935696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7283694Z 2025-12-04T10:11:57.7283984Z [W1204 09:50:30.497076709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7283987Z 2025-12-04T10:11:57.7284272Z [W1204 09:50:30.501652387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7284391Z 2025-12-04T10:11:57.7284687Z [W1204 09:50:30.502116105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7284692Z 2025-12-04T10:11:57.7284984Z [W1204 09:50:30.502253307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7284989Z 2025-12-04T10:11:57.7285068Z ('RERUN', {'yellow': True}) [0.5014s] [100%] 2025-12-04T10:11:57.7285789Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:31.992764154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7285793Z 2025-12-04T10:11:57.7286084Z [W1204 09:50:31.993299073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7286089Z 2025-12-04T10:11:57.7286379Z [W1204 09:50:31.993439555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7286382Z 2025-12-04T10:11:57.7286669Z [W1204 09:50:31.996344355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7286672Z 2025-12-04T10:11:57.7286959Z [W1204 09:50:31.996796433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7287029Z 2025-12-04T10:11:57.7287320Z [W1204 09:50:31.996934205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7287323Z 2025-12-04T10:11:57.7287611Z [W1204 09:50:31.001491483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7287648Z 2025-12-04T10:11:57.7287939Z [W1204 09:50:31.001948341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7287942Z 2025-12-04T10:11:57.7288228Z [W1204 09:50:31.002084503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7288235Z 2025-12-04T10:11:57.7288294Z FAILED [0.4984s] [100%] 2025-12-04T10:11:57.7288298Z 2025-12-04T10:11:57.7288384Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7288676Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7288749Z Traceback (most recent call last): 2025-12-04T10:11:57.7289055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7289127Z method(*args, **kwargs) 2025-12-04T10:11:57.7289419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7289487Z method(*args, **kwargs) 2025-12-04T10:11:57.7289784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7289844Z with policy(): 2025-12-04T10:11:57.7290142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7290210Z raise RuntimeError(msg) 2025-12-04T10:11:57.7291003Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7291046Z 2025-12-04T10:11:57.7291175Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7291691Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7291698Z 2025-12-04T10:11:57.7291854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7291986Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7292085Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7292630Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7292760Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7292824Z graph_break [] 2025-12-04T10:11:57.7292948Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7293642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7293713Z if out == self.unknown_value: 2025-12-04T10:11:57.7294092Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7294173Z Traceback (most recent call last): 2025-12-04T10:11:57.7294471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7294571Z method(*args, **kwargs) 2025-12-04T10:11:57.7294868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7294932Z method(*args, **kwargs) 2025-12-04T10:11:57.7295226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7295285Z with policy(): 2025-12-04T10:11:57.7295581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7295649Z raise RuntimeError(msg) 2025-12-04T10:11:57.7296446Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7296451Z 2025-12-04T10:11:57.7296581Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7297092Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7297096Z 2025-12-04T10:11:57.7297255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7297380Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7297472Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7298018Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7298143Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7298244Z graph_break [] 2025-12-04T10:11:57.7298366Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7299053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7299126Z if out == self.unknown_value: 2025-12-04T10:11:57.7299246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7299336Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7299465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7300003Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7300067Z graph_break [] 2025-12-04T10:11:57.7300150Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7300435Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7300510Z Traceback (most recent call last): 2025-12-04T10:11:57.7300811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7300878Z method(*args, **kwargs) 2025-12-04T10:11:57.7301239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7301304Z method(*args, **kwargs) 2025-12-04T10:11:57.7301596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7301688Z with policy(): 2025-12-04T10:11:57.7301985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7302063Z raise RuntimeError(msg) 2025-12-04T10:11:57.7302872Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7302877Z 2025-12-04T10:11:57.7303062Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7303865Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7303871Z 2025-12-04T10:11:57.7304042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7304176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7304270Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7304814Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7304944Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7305005Z graph_break [] 2025-12-04T10:11:57.7305135Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7305829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7305976Z if out == self.unknown_value: 2025-12-04T10:11:57.7306104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7306197Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7306324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7306872Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7306934Z graph_break [] 2025-12-04T10:11:57.7307059Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7307148Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7307270Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7307805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7307866Z graph_break [] 2025-12-04T10:11:57.7308364Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml - 2025-12-04T10:11:57.7308468Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7309834Z FAILED [0.4984s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7309881Z 2025-12-04T10:11:57.7310016Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7310540Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7310544Z 2025-12-04T10:11:57.7310709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7310819Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7310936Z ================== 1 failed, 57 deselected, 2 rerun in 11.99s ================== 2025-12-04T10:11:57.7310997Z Got exit code 1 2025-12-04T10:11:57.7311065Z Retrying single test... 2025-12-04T10:11:57.7311330Z W1204 09:50:37.603000 51472 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7311718Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml 2025-12-04T10:11:57.7311813Z ============================= test session starts ============================== 2025-12-04T10:11:57.7312022Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7312092Z cachedir: .pytest_cache 2025-12-04T10:11:57.7312402Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7312481Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7312551Z configfile: pytest.ini 2025-12-04T10:11:57.7312866Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7313001Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7313609Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7313679Z Running 1 items in this shard 2025-12-04T10:11:57.7313683Z 2025-12-04T10:11:57.7314412Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:39.196800294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7314419Z 2025-12-04T10:11:57.7314719Z [W1204 09:50:48.201438486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7314723Z 2025-12-04T10:11:57.7315017Z [W1204 09:50:48.201701410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7315024Z 2025-12-04T10:11:57.7315314Z [W1204 09:50:48.207715623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7315317Z 2025-12-04T10:11:57.7315611Z [W1204 09:50:48.208316733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7315614Z 2025-12-04T10:11:57.7315901Z [W1204 09:50:48.208499646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7315904Z 2025-12-04T10:11:57.7316259Z [W1204 09:50:48.213916809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7316263Z 2025-12-04T10:11:57.7316550Z [W1204 09:50:48.214453989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7316588Z 2025-12-04T10:11:57.7316884Z [W1204 09:50:48.214613381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7316888Z 2025-12-04T10:11:57.7316969Z ('RERUN', {'yellow': True}) [10.9480s] [100%] 2025-12-04T10:11:57.7317827Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:49.007377873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7317832Z 2025-12-04T10:11:57.7318138Z [W1204 09:50:49.007901652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7318142Z 2025-12-04T10:11:57.7318430Z [W1204 09:50:49.008040845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7318437Z 2025-12-04T10:11:57.7318730Z [W1204 09:50:49.010930274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7318734Z 2025-12-04T10:11:57.7319022Z [W1204 09:50:49.011378841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7319025Z 2025-12-04T10:11:57.7319316Z [W1204 09:50:49.011515724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7319320Z 2025-12-04T10:11:57.7319610Z [W1204 09:50:49.015890909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7319613Z 2025-12-04T10:11:57.7319972Z [W1204 09:50:49.016341796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7320052Z 2025-12-04T10:11:57.7320359Z [W1204 09:50:49.016478519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7320363Z 2025-12-04T10:11:57.7320444Z ('RERUN', {'yellow': True}) [0.4940s] [100%] 2025-12-04T10:11:57.7321179Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:49.500062184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7321183Z 2025-12-04T10:11:57.7321478Z [W1204 09:50:49.500628053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7321482Z 2025-12-04T10:11:57.7321790Z [W1204 09:50:49.500773166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7321797Z 2025-12-04T10:11:57.7322086Z [W1204 09:50:49.503631425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7322089Z 2025-12-04T10:11:57.7322384Z [W1204 09:50:49.504083113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7322387Z 2025-12-04T10:11:57.7322674Z [W1204 09:50:49.504222095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7322677Z 2025-12-04T10:11:57.7323084Z [W1204 09:50:49.508700612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7323088Z 2025-12-04T10:11:57.7323382Z [W1204 09:50:49.509156770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7323431Z 2025-12-04T10:11:57.7323728Z [W1204 09:50:49.509297213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7323731Z 2025-12-04T10:11:57.7323793Z FAILED [0.4881s] [100%] 2025-12-04T10:11:57.7323796Z 2025-12-04T10:11:57.7323880Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7324181Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7324260Z Traceback (most recent call last): 2025-12-04T10:11:57.7328875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7328979Z method(*args, **kwargs) 2025-12-04T10:11:57.7329325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7329402Z method(*args, **kwargs) 2025-12-04T10:11:57.7329728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7329796Z with policy(): 2025-12-04T10:11:57.7330103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7330171Z raise RuntimeError(msg) 2025-12-04T10:11:57.7330988Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7330993Z 2025-12-04T10:11:57.7331131Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7331659Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7331736Z 2025-12-04T10:11:57.7331904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7332043Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7332140Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7332698Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7332835Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7332893Z graph_break [] 2025-12-04T10:11:57.7333022Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7333730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7333804Z if out == self.unknown_value: 2025-12-04T10:11:57.7334105Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7334193Z Traceback (most recent call last): 2025-12-04T10:11:57.7334513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7334658Z method(*args, **kwargs) 2025-12-04T10:11:57.7334953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7335017Z method(*args, **kwargs) 2025-12-04T10:11:57.7335340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7335406Z with policy(): 2025-12-04T10:11:57.7335715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7335784Z raise RuntimeError(msg) 2025-12-04T10:11:57.7336596Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7336604Z 2025-12-04T10:11:57.7336735Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7337252Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7337262Z 2025-12-04T10:11:57.7337424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7337553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7337650Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7338203Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7338333Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7338396Z graph_break [] 2025-12-04T10:11:57.7338523Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7339216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7339327Z if out == self.unknown_value: 2025-12-04T10:11:57.7339450Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7339546Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7339666Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7340208Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7340267Z graph_break [] 2025-12-04T10:11:57.7340349Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7340641Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7340728Z Traceback (most recent call last): 2025-12-04T10:11:57.7341034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7341101Z method(*args, **kwargs) 2025-12-04T10:11:57.7341391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7341459Z method(*args, **kwargs) 2025-12-04T10:11:57.7341749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7341876Z with policy(): 2025-12-04T10:11:57.7342179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7342245Z raise RuntimeError(msg) 2025-12-04T10:11:57.7343092Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7343097Z 2025-12-04T10:11:57.7343226Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7343744Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7343752Z 2025-12-04T10:11:57.7343913Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7344040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7344134Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7344680Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7344806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7344867Z graph_break [] 2025-12-04T10:11:57.7344988Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7345677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7345745Z if out == self.unknown_value: 2025-12-04T10:11:57.7345867Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7345959Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7346128Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7346669Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7346730Z graph_break [] 2025-12-04T10:11:57.7346851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7346942Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7347060Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7347599Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7347657Z graph_break [] 2025-12-04T10:11:57.7348155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml - 2025-12-04T10:11:57.7348260Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7349606Z FAILED [0.4881s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7349612Z 2025-12-04T10:11:57.7349739Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7350286Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7350292Z 2025-12-04T10:11:57.7350450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7350554Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7350670Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.7350731Z Got exit code 1 2025-12-04T10:11:57.7351200Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7351446Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7351711Z W1204 09:50:56.102000 51658 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7352098Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml 2025-12-04T10:11:57.7352199Z ============================= test session starts ============================== 2025-12-04T10:11:57.7352406Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7352472Z cachedir: .pytest_cache 2025-12-04T10:11:57.7352782Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7352861Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7352930Z configfile: pytest.ini 2025-12-04T10:11:57.7353241Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7353370Z collecting ... collected 58 items / 31 deselected / 27 selected 2025-12-04T10:11:57.7353503Z stepcurrent: skipping 31 already run items. 2025-12-04T10:11:57.7353573Z Running 27 items in this shard 2025-12-04T10:11:57.7353577Z 2025-12-04T10:11:57.7354086Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0122s] [ 3%] 2025-12-04T10:11:57.7354575Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6090s] [ 3%] 2025-12-04T10:11:57.7355019Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6100s] [ 3%] 2025-12-04T10:11:57.7355023Z 2025-12-04T10:11:57.7355108Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7355403Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7355478Z Traceback (most recent call last): 2025-12-04T10:11:57.7355790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7355855Z method(*args, **kwargs) 2025-12-04T10:11:57.7356159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7356222Z method(*args, **kwargs) 2025-12-04T10:11:57.7356602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7356662Z with policy(): 2025-12-04T10:11:57.7356954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7357058Z raise RuntimeError(msg) 2025-12-04T10:11:57.7357863Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7357867Z 2025-12-04T10:11:57.7358001Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7358528Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7358532Z 2025-12-04T10:11:57.7358690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7358824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7358920Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7359277Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7359405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7359463Z graph_break [] 2025-12-04T10:11:57.7359755Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7359827Z Traceback (most recent call last): 2025-12-04T10:11:57.7360193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7360260Z method(*args, **kwargs) 2025-12-04T10:11:57.7360548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7360614Z method(*args, **kwargs) 2025-12-04T10:11:57.7360958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7361017Z with policy(): 2025-12-04T10:11:57.7361310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7361375Z raise RuntimeError(msg) 2025-12-04T10:11:57.7362202Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7362206Z 2025-12-04T10:11:57.7362329Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7362848Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7362858Z 2025-12-04T10:11:57.7363012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7363137Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7363233Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7363583Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7363775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7363840Z graph_break [] 2025-12-04T10:11:57.7363961Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7364052Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7364208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7364558Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7364620Z graph_break [] 2025-12-04T10:11:57.7364704Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7364995Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7365070Z Traceback (most recent call last): 2025-12-04T10:11:57.7365371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7365436Z method(*args, **kwargs) 2025-12-04T10:11:57.7365726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7365791Z method(*args, **kwargs) 2025-12-04T10:11:57.7366081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7366140Z with policy(): 2025-12-04T10:11:57.7366429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7366499Z raise RuntimeError(msg) 2025-12-04T10:11:57.7367325Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7367329Z 2025-12-04T10:11:57.7367453Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7367972Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7368013Z 2025-12-04T10:11:57.7368169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7368291Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7368386Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7368730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7368854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7368912Z graph_break [] 2025-12-04T10:11:57.7369031Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7369117Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7369241Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7369590Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7369648Z graph_break [] 2025-12-04T10:11:57.7369773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7369859Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7369982Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7370384Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7370443Z graph_break [] 2025-12-04T10:11:57.7370934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml - 2025-12-04T10:11:57.7371068Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7372373Z FAILED [0.6100s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7372380Z 2025-12-04T10:11:57.7372502Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7373018Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7373025Z 2025-12-04T10:11:57.7373179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7373282Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7373401Z ================== 1 failed, 31 deselected, 2 rerun in 3.26s =================== 2025-12-04T10:11:57.7373458Z Got exit code 1 2025-12-04T10:11:57.7373522Z Retrying single test... 2025-12-04T10:11:57.7373791Z W1204 09:51:05.938000 51847 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7374181Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml 2025-12-04T10:11:57.7374279Z ============================= test session starts ============================== 2025-12-04T10:11:57.7374483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7374598Z cachedir: .pytest_cache 2025-12-04T10:11:57.7374909Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7374985Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7375052Z configfile: pytest.ini 2025-12-04T10:11:57.7375364Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7375490Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7376063Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7376136Z Running 1 items in this shard 2025-12-04T10:11:57.7376139Z 2025-12-04T10:11:57.7376877Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:07.158767824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7376883Z 2025-12-04T10:11:57.7377179Z [W1204 09:51:16.355214618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7377183Z 2025-12-04T10:11:57.7377479Z [W1204 09:51:16.355463442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7377554Z 2025-12-04T10:11:57.7377842Z [W1204 09:51:16.361361582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7377845Z 2025-12-04T10:11:57.7378134Z [W1204 09:51:16.361930962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7378176Z 2025-12-04T10:11:57.7378474Z [W1204 09:51:16.362097215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7378478Z 2025-12-04T10:11:57.7378764Z [W1204 09:51:16.367490917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7378768Z 2025-12-04T10:11:57.7379058Z [W1204 09:51:16.368040227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7379062Z 2025-12-04T10:11:57.7379350Z [W1204 09:51:16.368212370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7379354Z 2025-12-04T10:11:57.7379440Z ('RERUN', {'yellow': True}) [11.2211s] [100%] 2025-12-04T10:11:57.7380167Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:17.705141106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7380172Z 2025-12-04T10:11:57.7380462Z [W1204 09:51:17.705657125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7380466Z 2025-12-04T10:11:57.7380752Z [W1204 09:51:17.705793807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7380755Z 2025-12-04T10:11:57.7381049Z [W1204 09:51:17.708647926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7381052Z 2025-12-04T10:11:57.7381339Z [W1204 09:51:17.709191725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7381380Z 2025-12-04T10:11:57.7381670Z [W1204 09:51:17.709329258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7381673Z 2025-12-04T10:11:57.7381967Z [W1204 09:51:17.713756273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7381970Z 2025-12-04T10:11:57.7382255Z [W1204 09:51:17.714209641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7382258Z 2025-12-04T10:11:57.7382548Z [W1204 09:51:17.714350903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7382551Z 2025-12-04T10:11:57.7382629Z ('RERUN', {'yellow': True}) [0.5823s] [100%] 2025-12-04T10:11:57.7383359Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:18.285017860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7383369Z 2025-12-04T10:11:57.7383661Z [W1204 09:51:18.285530789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7383664Z 2025-12-04T10:11:57.7383953Z [W1204 09:51:18.285669752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7383956Z 2025-12-04T10:11:57.7384324Z [W1204 09:51:18.288483690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7384328Z 2025-12-04T10:11:57.7384615Z [W1204 09:51:18.289018100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7384655Z 2025-12-04T10:11:57.7384944Z [W1204 09:51:18.289159312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7384947Z 2025-12-04T10:11:57.7385233Z [W1204 09:51:18.293565987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7385236Z 2025-12-04T10:11:57.7385524Z [W1204 09:51:18.294018995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7385528Z 2025-12-04T10:11:57.7385817Z [W1204 09:51:18.294159357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7385820Z 2025-12-04T10:11:57.7385882Z FAILED [0.5787s] [100%] 2025-12-04T10:11:57.7385885Z 2025-12-04T10:11:57.7385965Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7386267Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7386348Z Traceback (most recent call last): 2025-12-04T10:11:57.7386655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7386723Z method(*args, **kwargs) 2025-12-04T10:11:57.7387016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7387080Z method(*args, **kwargs) 2025-12-04T10:11:57.7387372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7387430Z with policy(): 2025-12-04T10:11:57.7387729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7387832Z raise RuntimeError(msg) 2025-12-04T10:11:57.7388639Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7388643Z 2025-12-04T10:11:57.7388773Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7389296Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7389300Z 2025-12-04T10:11:57.7389458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7389582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7389678Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7390028Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7390152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7390210Z graph_break [] 2025-12-04T10:11:57.7390331Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7391098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7391176Z if out == self.unknown_value: 2025-12-04T10:11:57.7391470Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7391580Z Traceback (most recent call last): 2025-12-04T10:11:57.7391884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7391949Z method(*args, **kwargs) 2025-12-04T10:11:57.7392245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7392307Z method(*args, **kwargs) 2025-12-04T10:11:57.7392595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7392658Z with policy(): 2025-12-04T10:11:57.7392955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7393022Z raise RuntimeError(msg) 2025-12-04T10:11:57.7393847Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7393853Z 2025-12-04T10:11:57.7393984Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7394506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7394510Z 2025-12-04T10:11:57.7394671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7394804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7394898Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7395247Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7395413Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7395473Z graph_break [] 2025-12-04T10:11:57.7395598Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7396287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7396356Z if out == self.unknown_value: 2025-12-04T10:11:57.7396496Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7396587Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7396714Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7397059Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7397117Z graph_break [] 2025-12-04T10:11:57.7397203Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7397494Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7397571Z Traceback (most recent call last): 2025-12-04T10:11:57.7397868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7398075Z method(*args, **kwargs) 2025-12-04T10:11:57.7398371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7398432Z method(*args, **kwargs) 2025-12-04T10:11:57.7398755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7398819Z with policy(): 2025-12-04T10:11:57.7399108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7399176Z raise RuntimeError(msg) 2025-12-04T10:11:57.7400039Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7400044Z 2025-12-04T10:11:57.7400173Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7400694Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7400701Z 2025-12-04T10:11:57.7400857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7400981Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7401072Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7401414Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7401540Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7401602Z graph_break [] 2025-12-04T10:11:57.7401726Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7402413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7402521Z if out == self.unknown_value: 2025-12-04T10:11:57.7402647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7402733Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7402859Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7403211Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7403268Z graph_break [] 2025-12-04T10:11:57.7403395Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7403483Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7403603Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7403943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7404010Z graph_break [] 2025-12-04T10:11:57.7404502Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml - 2025-12-04T10:11:57.7404603Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7405975Z FAILED [0.5787s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7406013Z 2025-12-04T10:11:57.7406138Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7406655Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7406663Z 2025-12-04T10:11:57.7406818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7406924Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7407045Z ================== 1 failed, 57 deselected, 2 rerun in 12.41s ================== 2025-12-04T10:11:57.7407102Z Got exit code 1 2025-12-04T10:11:57.7407166Z Retrying single test... 2025-12-04T10:11:57.7407432Z W1204 09:51:24.943000 52041 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7407820Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml 2025-12-04T10:11:57.7407918Z ============================= test session starts ============================== 2025-12-04T10:11:57.7408125Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7408188Z cachedir: .pytest_cache 2025-12-04T10:11:57.7408501Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7408574Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7408640Z configfile: pytest.ini 2025-12-04T10:11:57.7408957Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7409087Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7409661Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7409787Z Running 1 items in this shard 2025-12-04T10:11:57.7409791Z 2025-12-04T10:11:57.7410524Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:26.185574450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7410532Z 2025-12-04T10:11:57.7410829Z [W1204 09:51:35.118662316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7410832Z 2025-12-04T10:11:57.7411122Z [W1204 09:51:35.118925991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7411127Z 2025-12-04T10:11:57.7411417Z [W1204 09:51:35.124850582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7411420Z 2025-12-04T10:11:57.7411705Z [W1204 09:51:35.125451013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7411708Z 2025-12-04T10:11:57.7411997Z [W1204 09:51:35.125622185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7412001Z 2025-12-04T10:11:57.7412350Z [W1204 09:51:35.131093849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7412353Z 2025-12-04T10:11:57.7412641Z [W1204 09:51:35.131660768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7412678Z 2025-12-04T10:11:57.7412963Z [W1204 09:51:35.131835251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7412966Z 2025-12-04T10:11:57.7413050Z ('RERUN', {'yellow': True}) [10.9828s] [100%] 2025-12-04T10:11:57.7413771Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:36.482674704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7413775Z 2025-12-04T10:11:57.7414069Z [W1204 09:51:36.483206613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7414078Z 2025-12-04T10:11:57.7414363Z [W1204 09:51:36.483346745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7414369Z 2025-12-04T10:11:57.7414653Z [W1204 09:51:36.486285115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7414658Z 2025-12-04T10:11:57.7414962Z [W1204 09:51:36.486847955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7414965Z 2025-12-04T10:11:57.7415250Z [W1204 09:51:36.486987718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7415253Z 2025-12-04T10:11:57.7415543Z [W1204 09:51:36.491620847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7415546Z 2025-12-04T10:11:57.7415832Z [W1204 09:51:36.492083745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7415837Z 2025-12-04T10:11:57.7416164Z [W1204 09:51:36.492221447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7416167Z 2025-12-04T10:11:57.7416246Z ('RERUN', {'yellow': True}) [0.5971s] [100%] 2025-12-04T10:11:57.7416968Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:37.077498584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7416972Z 2025-12-04T10:11:57.7417453Z [W1204 09:51:37.078036213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7417457Z 2025-12-04T10:11:57.7417746Z [W1204 09:51:37.078175936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7417753Z 2025-12-04T10:11:57.7418048Z [W1204 09:51:37.081150827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7418050Z 2025-12-04T10:11:57.7418335Z [W1204 09:51:37.081715466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7418338Z 2025-12-04T10:11:57.7418626Z [W1204 09:51:37.081853999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7418629Z 2025-12-04T10:11:57.7418988Z [W1204 09:51:37.086418527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7418992Z 2025-12-04T10:11:57.7419283Z [W1204 09:51:37.086877484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7419319Z 2025-12-04T10:11:57.7419607Z [W1204 09:51:37.087014867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7419611Z 2025-12-04T10:11:57.7419672Z FAILED [0.5975s] [100%] 2025-12-04T10:11:57.7419676Z 2025-12-04T10:11:57.7419762Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7420056Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7420135Z Traceback (most recent call last): 2025-12-04T10:11:57.7420442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7420506Z method(*args, **kwargs) 2025-12-04T10:11:57.7420807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7420868Z method(*args, **kwargs) 2025-12-04T10:11:57.7421161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7421221Z with policy(): 2025-12-04T10:11:57.7421515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7421582Z raise RuntimeError(msg) 2025-12-04T10:11:57.7422391Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7422396Z 2025-12-04T10:11:57.7422526Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7423046Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7423089Z 2025-12-04T10:11:57.7423247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7423376Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7423470Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7423821Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7423951Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7424008Z graph_break [] 2025-12-04T10:11:57.7424136Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7424832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7424911Z if out == self.unknown_value: 2025-12-04T10:11:57.7425203Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7425275Z Traceback (most recent call last): 2025-12-04T10:11:57.7425573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7425635Z method(*args, **kwargs) 2025-12-04T10:11:57.7425994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7426059Z method(*args, **kwargs) 2025-12-04T10:11:57.7426354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7426449Z with policy(): 2025-12-04T10:11:57.7426744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7426808Z raise RuntimeError(msg) 2025-12-04T10:11:57.7427641Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7427645Z 2025-12-04T10:11:57.7427772Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7428296Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7428301Z 2025-12-04T10:11:57.7428456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7428582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7428679Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7429039Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7429169Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7429227Z graph_break [] 2025-12-04T10:11:57.7429346Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7430040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7430109Z if out == self.unknown_value: 2025-12-04T10:11:57.7430276Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7430366Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7430488Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7430835Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7430893Z graph_break [] 2025-12-04T10:11:57.7430976Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7431271Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7431343Z Traceback (most recent call last): 2025-12-04T10:11:57.7431640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7431705Z method(*args, **kwargs) 2025-12-04T10:11:57.7431996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7432063Z method(*args, **kwargs) 2025-12-04T10:11:57.7432349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7432410Z with policy(): 2025-12-04T10:11:57.7432698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7432762Z raise RuntimeError(msg) 2025-12-04T10:11:57.7433653Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7433710Z 2025-12-04T10:11:57.7433835Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7434355Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7434359Z 2025-12-04T10:11:57.7434513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7434633Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7434729Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7435070Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7435193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7435254Z graph_break [] 2025-12-04T10:11:57.7435375Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7436059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7436127Z if out == self.unknown_value: 2025-12-04T10:11:57.7436250Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7436339Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7436460Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7436801Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7436860Z graph_break [] 2025-12-04T10:11:57.7437031Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7437125Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7437244Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7437584Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7437640Z graph_break [] 2025-12-04T10:11:57.7438128Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml - 2025-12-04T10:11:57.7438232Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7439535Z FAILED [0.5975s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7439543Z 2025-12-04T10:11:57.7439669Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7440292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7440297Z 2025-12-04T10:11:57.7440456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7440561Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7440712Z ================== 1 failed, 57 deselected, 2 rerun in 12.20s ================== 2025-12-04T10:11:57.7440774Z Got exit code 1 2025-12-04T10:11:57.7441245Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7441498Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7441771Z W1204 09:51:43.709000 52235 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7442160Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml 2025-12-04T10:11:57.7442259Z ============================= test session starts ============================== 2025-12-04T10:11:57.7442464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7442535Z cachedir: .pytest_cache 2025-12-04T10:11:57.7442837Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7442914Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7442981Z configfile: pytest.ini 2025-12-04T10:11:57.7443293Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7443417Z collecting ... collected 58 items / 32 deselected / 26 selected 2025-12-04T10:11:57.7443507Z stepcurrent: skipping 32 already run items. 2025-12-04T10:11:57.7443578Z Running 26 items in this shard 2025-12-04T10:11:57.7443581Z 2025-12-04T10:11:57.7444080Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8613s] [ 3%] 2025-12-04T10:11:57.7444602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4565s] [ 3%] 2025-12-04T10:11:57.7445040Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4424s] [ 3%] 2025-12-04T10:11:57.7445047Z 2025-12-04T10:11:57.7445130Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7445418Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7445492Z Traceback (most recent call last): 2025-12-04T10:11:57.7445796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7445861Z method(*args, **kwargs) 2025-12-04T10:11:57.7446170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7446235Z method(*args, **kwargs) 2025-12-04T10:11:57.7446530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7446587Z with policy(): 2025-12-04T10:11:57.7446881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7446949Z raise RuntimeError(msg) 2025-12-04T10:11:57.7447811Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7447848Z 2025-12-04T10:11:57.7447977Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7448506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7448510Z 2025-12-04T10:11:57.7448666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7448792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7448883Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7449236Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7449359Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7449416Z graph_break [] 2025-12-04T10:11:57.7449705Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7449778Z Traceback (most recent call last): 2025-12-04T10:11:57.7450072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7450134Z method(*args, **kwargs) 2025-12-04T10:11:57.7450420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7450484Z method(*args, **kwargs) 2025-12-04T10:11:57.7450774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7450831Z with policy(): 2025-12-04T10:11:57.7451124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7451188Z raise RuntimeError(msg) 2025-12-04T10:11:57.7451999Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7452041Z 2025-12-04T10:11:57.7452167Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7452681Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7452691Z 2025-12-04T10:11:57.7452843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7452965Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7453058Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7453404Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7453528Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7453589Z graph_break [] 2025-12-04T10:11:57.7453710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7453798Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7453918Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7454322Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7454384Z graph_break [] 2025-12-04T10:11:57.7454466Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7454786Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7454859Z Traceback (most recent call last): 2025-12-04T10:11:57.7455151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7455221Z method(*args, **kwargs) 2025-12-04T10:11:57.7455514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7455576Z method(*args, **kwargs) 2025-12-04T10:11:57.7455868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7455926Z with policy(): 2025-12-04T10:11:57.7456220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7456284Z raise RuntimeError(msg) 2025-12-04T10:11:57.7457095Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7457100Z 2025-12-04T10:11:57.7457223Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7457732Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7457739Z 2025-12-04T10:11:57.7457905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7458026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7458115Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7458497Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7458617Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7458677Z graph_break [] 2025-12-04T10:11:57.7458798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7458889Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7459008Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7459352Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7459408Z graph_break [] 2025-12-04T10:11:57.7459532Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7459619Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7459743Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7460078Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7460135Z graph_break [] 2025-12-04T10:11:57.7460637Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml - 2025-12-04T10:11:57.7460737Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7462111Z FAILED [0.4424s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7462149Z 2025-12-04T10:11:57.7462271Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7462793Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7462796Z 2025-12-04T10:11:57.7462950Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7463056Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7463174Z ================== 1 failed, 32 deselected, 2 rerun in 2.78s =================== 2025-12-04T10:11:57.7463230Z Got exit code 1 2025-12-04T10:11:57.7463296Z Retrying single test... 2025-12-04T10:11:57.7463557Z W1204 09:51:53.379000 52423 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7463940Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml 2025-12-04T10:11:57.7464035Z ============================= test session starts ============================== 2025-12-04T10:11:57.7464238Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7464302Z cachedir: .pytest_cache 2025-12-04T10:11:57.7464611Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7464685Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7464751Z configfile: pytest.ini 2025-12-04T10:11:57.7465062Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7465231Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7465798Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7465867Z Running 1 items in this shard 2025-12-04T10:11:57.7465871Z 2025-12-04T10:11:57.7466611Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:51:54.425539452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7466615Z 2025-12-04T10:11:57.7466911Z [W1204 09:52:03.549824206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7466917Z 2025-12-04T10:11:57.7467207Z [W1204 09:52:03.550103530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7467211Z 2025-12-04T10:11:57.7467497Z [W1204 09:52:03.555824129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7467500Z 2025-12-04T10:11:57.7467785Z [W1204 09:52:03.556380909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7467792Z 2025-12-04T10:11:57.7468144Z [W1204 09:52:03.556540232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7468147Z 2025-12-04T10:11:57.7468434Z [W1204 09:52:03.562111207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7468469Z 2025-12-04T10:11:57.7468760Z [W1204 09:52:03.562628436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7468763Z 2025-12-04T10:11:57.7469050Z [W1204 09:52:03.562788309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7469053Z 2025-12-04T10:11:57.7469134Z ('RERUN', {'yellow': True}) [10.9813s] [100%] 2025-12-04T10:11:57.7469855Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:04.741666298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7469860Z 2025-12-04T10:11:57.7470150Z [W1204 09:52:04.742232758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7470155Z 2025-12-04T10:11:57.7470442Z [W1204 09:52:04.742372350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7470445Z 2025-12-04T10:11:57.7470733Z [W1204 09:52:04.745420683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7470737Z 2025-12-04T10:11:57.7471022Z [W1204 09:52:04.745986482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7471025Z 2025-12-04T10:11:57.7471312Z [W1204 09:52:04.746124225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7471319Z 2025-12-04T10:11:57.7471602Z [W1204 09:52:04.750853946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7471607Z 2025-12-04T10:11:57.7471892Z [W1204 09:52:04.751328104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7471931Z 2025-12-04T10:11:57.7472221Z [W1204 09:52:04.751465847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7472224Z 2025-12-04T10:11:57.7472302Z ('RERUN', {'yellow': True}) [0.4195s] [100%] 2025-12-04T10:11:57.7473020Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:05.158249477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7473024Z 2025-12-04T10:11:57.7473311Z [W1204 09:52:05.158784916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7473315Z 2025-12-04T10:11:57.7473604Z [W1204 09:52:05.158927598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7473608Z 2025-12-04T10:11:57.7473892Z [W1204 09:52:05.161956730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7473895Z 2025-12-04T10:11:57.7474186Z [W1204 09:52:05.162518670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7474189Z 2025-12-04T10:11:57.7474539Z [W1204 09:52:05.162655162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7474543Z 2025-12-04T10:11:57.7474828Z [W1204 09:52:05.167303161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7474831Z 2025-12-04T10:11:57.7475152Z [W1204 09:52:05.167768220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7475157Z 2025-12-04T10:11:57.7475440Z [W1204 09:52:05.167903572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7475443Z 2025-12-04T10:11:57.7475508Z FAILED [0.4160s] [100%] 2025-12-04T10:11:57.7475511Z 2025-12-04T10:11:57.7475593Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7475903Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7475988Z Traceback (most recent call last): 2025-12-04T10:11:57.7476296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7476365Z method(*args, **kwargs) 2025-12-04T10:11:57.7476658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7476722Z method(*args, **kwargs) 2025-12-04T10:11:57.7477019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7477077Z with policy(): 2025-12-04T10:11:57.7477371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7477436Z raise RuntimeError(msg) 2025-12-04T10:11:57.7478233Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7478237Z 2025-12-04T10:11:57.7478366Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7478924Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7478928Z 2025-12-04T10:11:57.7479091Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7479217Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7479310Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7479661Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7479786Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7479844Z graph_break [] 2025-12-04T10:11:57.7480017Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7480721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7480794Z if out == self.unknown_value: 2025-12-04T10:11:57.7481080Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7481154Z Traceback (most recent call last): 2025-12-04T10:11:57.7481519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7481583Z method(*args, **kwargs) 2025-12-04T10:11:57.7481877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7481939Z method(*args, **kwargs) 2025-12-04T10:11:57.7482263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7482326Z with policy(): 2025-12-04T10:11:57.7482616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7482684Z raise RuntimeError(msg) 2025-12-04T10:11:57.7483494Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7483499Z 2025-12-04T10:11:57.7483628Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7484143Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7484150Z 2025-12-04T10:11:57.7484304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7484443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7484536Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7484885Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7485011Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7485070Z graph_break [] 2025-12-04T10:11:57.7485195Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7485880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7485988Z if out == self.unknown_value: 2025-12-04T10:11:57.7486110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7486197Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7486321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7486660Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7486716Z graph_break [] 2025-12-04T10:11:57.7486804Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7487088Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7487164Z Traceback (most recent call last): 2025-12-04T10:11:57.7487460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7487521Z method(*args, **kwargs) 2025-12-04T10:11:57.7487812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7487873Z method(*args, **kwargs) 2025-12-04T10:11:57.7488158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7488217Z with policy(): 2025-12-04T10:11:57.7488589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7488657Z raise RuntimeError(msg) 2025-12-04T10:11:57.7489468Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7489506Z 2025-12-04T10:11:57.7489631Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7490150Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7490153Z 2025-12-04T10:11:57.7490307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7490444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7490536Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7490877Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7491003Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7491059Z graph_break [] 2025-12-04T10:11:57.7491185Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7491874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7491940Z if out == self.unknown_value: 2025-12-04T10:11:57.7492064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7492150Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7492273Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7492612Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7492708Z graph_break [] 2025-12-04T10:11:57.7492831Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7492915Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7493033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7493372Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7493428Z graph_break [] 2025-12-04T10:11:57.7493924Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml - 2025-12-04T10:11:57.7494026Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7495321Z FAILED [0.4160s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7495326Z 2025-12-04T10:11:57.7495450Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7496033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7496039Z 2025-12-04T10:11:57.7496194Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7496332Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7496451Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ================== 2025-12-04T10:11:57.7496508Z Got exit code 1 2025-12-04T10:11:57.7496570Z Retrying single test... 2025-12-04T10:11:57.7496834Z W1204 09:52:11.844000 52616 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7497217Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml 2025-12-04T10:11:57.7497313Z ============================= test session starts ============================== 2025-12-04T10:11:57.7497524Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7497588Z cachedir: .pytest_cache 2025-12-04T10:11:57.7497896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7497974Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7498038Z configfile: pytest.ini 2025-12-04T10:11:57.7498350Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7498478Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7499043Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7499115Z Running 1 items in this shard 2025-12-04T10:11:57.7499118Z 2025-12-04T10:11:57.7499839Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:12.904307739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7499889Z 2025-12-04T10:11:57.7500187Z [W1204 09:52:22.950981626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7500191Z 2025-12-04T10:11:57.7500478Z [W1204 09:52:22.951234980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7500484Z 2025-12-04T10:11:57.7500772Z [W1204 09:52:22.957031440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7500777Z 2025-12-04T10:11:57.7501062Z [W1204 09:52:22.957606810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7501065Z 2025-12-04T10:11:57.7501363Z [W1204 09:52:22.957772233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7501369Z 2025-12-04T10:11:57.7501657Z [W1204 09:52:22.963316978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7501660Z 2025-12-04T10:11:57.7501948Z [W1204 09:52:22.963840107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7501951Z 2025-12-04T10:11:57.7502235Z [W1204 09:52:22.963994529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7502239Z 2025-12-04T10:11:57.7502387Z ('RERUN', {'yellow': True}) [10.9042s] [100%] 2025-12-04T10:11:57.7503155Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:23.132051848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7503194Z 2025-12-04T10:11:57.7503517Z [W1204 09:52:23.132600648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7503524Z 2025-12-04T10:11:57.7503812Z [W1204 09:52:23.132748540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7503816Z 2025-12-04T10:11:57.7504103Z [W1204 09:52:23.135706361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7504107Z 2025-12-04T10:11:57.7504401Z [W1204 09:52:23.136279421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7504404Z 2025-12-04T10:11:57.7504691Z [W1204 09:52:23.136431143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7504699Z 2025-12-04T10:11:57.7504989Z [W1204 09:52:23.141022762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7504992Z 2025-12-04T10:11:57.7505278Z [W1204 09:52:23.141487710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7505281Z 2025-12-04T10:11:57.7505568Z [W1204 09:52:23.141625562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7505573Z 2025-12-04T10:11:57.7505653Z ('RERUN', {'yellow': True}) [0.4099s] [100%] 2025-12-04T10:11:57.7506370Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:23.540086527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7506419Z 2025-12-04T10:11:57.7506711Z [W1204 09:52:23.540629867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7506714Z 2025-12-04T10:11:57.7506999Z [W1204 09:52:23.540775709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7507005Z 2025-12-04T10:11:57.7507286Z [W1204 09:52:23.543710920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7507290Z 2025-12-04T10:11:57.7507576Z [W1204 09:52:23.544272889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7507579Z 2025-12-04T10:11:57.7507866Z [W1204 09:52:23.544420262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7507872Z 2025-12-04T10:11:57.7508156Z [W1204 09:52:23.549016781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7508159Z 2025-12-04T10:11:57.7508447Z [W1204 09:52:23.549478059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7508450Z 2025-12-04T10:11:57.7508734Z [W1204 09:52:23.549614761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7508738Z 2025-12-04T10:11:57.7508801Z FAILED [0.4061s] [100%] 2025-12-04T10:11:57.7508944Z 2025-12-04T10:11:57.7509026Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7509317Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7509429Z Traceback (most recent call last): 2025-12-04T10:11:57.7509736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7509802Z method(*args, **kwargs) 2025-12-04T10:11:57.7510094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7510157Z method(*args, **kwargs) 2025-12-04T10:11:57.7510446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7510504Z with policy(): 2025-12-04T10:11:57.7510799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7510865Z raise RuntimeError(msg) 2025-12-04T10:11:57.7511671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7511678Z 2025-12-04T10:11:57.7511810Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7512327Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7512331Z 2025-12-04T10:11:57.7512488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7512614Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7512711Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7513061Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7513252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7513310Z graph_break [] 2025-12-04T10:11:57.7513432Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7514126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7514196Z if out == self.unknown_value: 2025-12-04T10:11:57.7514484Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7514555Z Traceback (most recent call last): 2025-12-04T10:11:57.7514852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7514916Z method(*args, **kwargs) 2025-12-04T10:11:57.7515208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7515269Z method(*args, **kwargs) 2025-12-04T10:11:57.7515554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7515616Z with policy(): 2025-12-04T10:11:57.7515906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7515971Z raise RuntimeError(msg) 2025-12-04T10:11:57.7516856Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7516894Z 2025-12-04T10:11:57.7517180Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7517709Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7517713Z 2025-12-04T10:11:57.7517865Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7517994Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7518087Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7518435Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7518562Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7518631Z graph_break [] 2025-12-04T10:11:57.7518759Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7519448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7519516Z if out == self.unknown_value: 2025-12-04T10:11:57.7519641Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7519728Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7519852Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7520233Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7520291Z graph_break [] 2025-12-04T10:11:57.7520454Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7520741Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7520812Z Traceback (most recent call last): 2025-12-04T10:11:57.7521110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7521173Z method(*args, **kwargs) 2025-12-04T10:11:57.7521467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7521534Z method(*args, **kwargs) 2025-12-04T10:11:57.7521822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7521882Z with policy(): 2025-12-04T10:11:57.7522187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7522256Z raise RuntimeError(msg) 2025-12-04T10:11:57.7523070Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7523074Z 2025-12-04T10:11:57.7523196Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7523812Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7523816Z 2025-12-04T10:11:57.7523969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7524143Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7524232Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7524572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7524699Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7524755Z graph_break [] 2025-12-04T10:11:57.7524877Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7525568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7525635Z if out == self.unknown_value: 2025-12-04T10:11:57.7525766Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7525855Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7525974Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7526316Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7526373Z graph_break [] 2025-12-04T10:11:57.7526495Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7526582Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7526704Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7527045Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7527103Z graph_break [] 2025-12-04T10:11:57.7527630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml - 2025-12-04T10:11:57.7527731Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7529015Z FAILED [0.4061s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7529027Z 2025-12-04T10:11:57.7529150Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7529664Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7529669Z 2025-12-04T10:11:57.7529823Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7529925Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7530042Z ================== 1 failed, 57 deselected, 2 rerun in 11.74s ================== 2025-12-04T10:11:57.7530111Z Got exit code 1 2025-12-04T10:11:57.7530649Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7530896Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7531161Z W1204 09:52:30.184000 52809 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7531581Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml 2025-12-04T10:11:57.7531678Z ============================= test session starts ============================== 2025-12-04T10:11:57.7531883Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7531949Z cachedir: .pytest_cache 2025-12-04T10:11:57.7532251Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7532327Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7532394Z configfile: pytest.ini 2025-12-04T10:11:57.7532706Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7532842Z collecting ... collected 58 items / 33 deselected / 25 selected 2025-12-04T10:11:57.7532930Z stepcurrent: skipping 33 already run items. 2025-12-04T10:11:57.7532997Z Running 25 items in this shard 2025-12-04T10:11:57.7533002Z 2025-12-04T10:11:57.7533496Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9207s] [ 4%] 2025-12-04T10:11:57.7533980Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5358s] [ 4%] 2025-12-04T10:11:57.7534424Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5232s] [ 4%] 2025-12-04T10:11:57.7534428Z 2025-12-04T10:11:57.7534512Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7534842Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7534915Z Traceback (most recent call last): 2025-12-04T10:11:57.7535220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7535286Z method(*args, **kwargs) 2025-12-04T10:11:57.7535575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7535636Z method(*args, **kwargs) 2025-12-04T10:11:57.7535928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7535989Z with policy(): 2025-12-04T10:11:57.7536280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7536348Z raise RuntimeError(msg) 2025-12-04T10:11:57.7537149Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7537153Z 2025-12-04T10:11:57.7537278Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7537859Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7537864Z 2025-12-04T10:11:57.7538023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7538158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7538302Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7538856Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7538980Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7539040Z graph_break [] 2025-12-04T10:11:57.7539324Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7539394Z Traceback (most recent call last): 2025-12-04T10:11:57.7539702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7539765Z method(*args, **kwargs) 2025-12-04T10:11:57.7540058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7540129Z method(*args, **kwargs) 2025-12-04T10:11:57.7540415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7540475Z with policy(): 2025-12-04T10:11:57.7540765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7540829Z raise RuntimeError(msg) 2025-12-04T10:11:57.7541640Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7541644Z 2025-12-04T10:11:57.7541766Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7542287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7542329Z 2025-12-04T10:11:57.7542484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7542607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7542705Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7543250Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7543377Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7543435Z graph_break [] 2025-12-04T10:11:57.7543556Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7543650Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7543768Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7544311Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7544368Z graph_break [] 2025-12-04T10:11:57.7544448Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7544800Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7544875Z Traceback (most recent call last): 2025-12-04T10:11:57.7545178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7545276Z method(*args, **kwargs) 2025-12-04T10:11:57.7545568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7545632Z method(*args, **kwargs) 2025-12-04T10:11:57.7545920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7545977Z with policy(): 2025-12-04T10:11:57.7546270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7546335Z raise RuntimeError(msg) 2025-12-04T10:11:57.7547148Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7547154Z 2025-12-04T10:11:57.7547276Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7547788Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7547794Z 2025-12-04T10:11:57.7547954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7548079Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7548171Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7548709Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7548834Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7548930Z graph_break [] 2025-12-04T10:11:57.7549051Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7549142Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7549260Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7549793Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7549853Z graph_break [] 2025-12-04T10:11:57.7549978Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7550067Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7550186Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7550720Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7550782Z graph_break [] 2025-12-04T10:11:57.7551267Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml - 2025-12-04T10:11:57.7551369Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7552717Z FAILED [0.5232s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7552755Z 2025-12-04T10:11:57.7552882Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7553395Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7553399Z 2025-12-04T10:11:57.7553549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7553657Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7553777Z ================== 1 failed, 33 deselected, 2 rerun in 3.00s =================== 2025-12-04T10:11:57.7553836Z Got exit code 1 2025-12-04T10:11:57.7553898Z Retrying single test... 2025-12-04T10:11:57.7554160Z W1204 09:52:39.782000 52998 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7554550Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml 2025-12-04T10:11:57.7554643Z ============================= test session starts ============================== 2025-12-04T10:11:57.7554845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7554910Z cachedir: .pytest_cache 2025-12-04T10:11:57.7555215Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7555294Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7555358Z configfile: pytest.ini 2025-12-04T10:11:57.7555667Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7555802Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7556405Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7556480Z Running 1 items in this shard 2025-12-04T10:11:57.7556484Z 2025-12-04T10:11:57.7557206Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:52:41.370651197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7557210Z 2025-12-04T10:11:57.7557510Z [W1204 09:52:50.605282863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7557516Z 2025-12-04T10:11:57.7557813Z [W1204 09:52:50.605532867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7557819Z 2025-12-04T10:11:57.7558109Z [W1204 09:52:50.611415789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7558113Z 2025-12-04T10:11:57.7558400Z [W1204 09:52:50.612014989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7558403Z 2025-12-04T10:11:57.7558685Z [W1204 09:52:50.612200232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7558688Z 2025-12-04T10:11:57.7559042Z [W1204 09:52:50.617565303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7559046Z 2025-12-04T10:11:57.7559332Z [W1204 09:52:50.618088632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7559369Z 2025-12-04T10:11:57.7559654Z [W1204 09:52:50.618249745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7559657Z 2025-12-04T10:11:57.7559737Z ('RERUN', {'yellow': True}) [11.1748s] [100%] 2025-12-04T10:11:57.7560501Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:52:51.414759709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7560505Z 2025-12-04T10:11:57.7560795Z [W1204 09:52:51.415268477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7560798Z 2025-12-04T10:11:57.7561082Z [W1204 09:52:51.415409040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7561090Z 2025-12-04T10:11:57.7561373Z [W1204 09:52:51.418276519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7561376Z 2025-12-04T10:11:57.7561658Z [W1204 09:52:51.418727346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7561662Z 2025-12-04T10:11:57.7561949Z [W1204 09:52:51.418871509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7561952Z 2025-12-04T10:11:57.7562239Z [W1204 09:52:51.423353486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7562242Z 2025-12-04T10:11:57.7562527Z [W1204 09:52:51.423813924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7562570Z 2025-12-04T10:11:57.7562856Z [W1204 09:52:51.423949546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7562859Z 2025-12-04T10:11:57.7562937Z ('RERUN', {'yellow': True}) [0.4976s] [100%] 2025-12-04T10:11:57.7563654Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:52:51.910875157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7563659Z 2025-12-04T10:11:57.7563951Z [W1204 09:52:51.911383756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7563953Z 2025-12-04T10:11:57.7564239Z [W1204 09:52:51.911522879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7564245Z 2025-12-04T10:11:57.7564529Z [W1204 09:52:51.914423728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7564532Z 2025-12-04T10:11:57.7564820Z [W1204 09:52:51.914874276 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7564822Z 2025-12-04T10:11:57.7565107Z [W1204 09:52:51.915011578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7565110Z 2025-12-04T10:11:57.7565491Z [W1204 09:52:51.919514225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7565496Z 2025-12-04T10:11:57.7565785Z [W1204 09:52:51.919969053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7565824Z 2025-12-04T10:11:57.7566115Z [W1204 09:52:51.920165866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7566118Z 2025-12-04T10:11:57.7566177Z FAILED [0.4954s] [100%] 2025-12-04T10:11:57.7566180Z 2025-12-04T10:11:57.7566259Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7566548Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7566620Z Traceback (most recent call last): 2025-12-04T10:11:57.7566933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7566997Z method(*args, **kwargs) 2025-12-04T10:11:57.7567288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7567357Z method(*args, **kwargs) 2025-12-04T10:11:57.7567647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7567709Z with policy(): 2025-12-04T10:11:57.7568002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7568067Z raise RuntimeError(msg) 2025-12-04T10:11:57.7568869Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7568873Z 2025-12-04T10:11:57.7568998Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7569517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7569559Z 2025-12-04T10:11:57.7569715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7569841Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7569948Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7570496Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7570625Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7570680Z graph_break [] 2025-12-04T10:11:57.7570802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7571497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7571569Z if out == self.unknown_value: 2025-12-04T10:11:57.7571858Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7571928Z Traceback (most recent call last): 2025-12-04T10:11:57.7572221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7572353Z method(*args, **kwargs) 2025-12-04T10:11:57.7572643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7572703Z method(*args, **kwargs) 2025-12-04T10:11:57.7572996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7573089Z with policy(): 2025-12-04T10:11:57.7573387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7573451Z raise RuntimeError(msg) 2025-12-04T10:11:57.7574264Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7574274Z 2025-12-04T10:11:57.7574399Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7574915Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7574922Z 2025-12-04T10:11:57.7575077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7575200Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7575294Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7575834Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7575962Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7576021Z graph_break [] 2025-12-04T10:11:57.7576142Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7576836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7576943Z if out == self.unknown_value: 2025-12-04T10:11:57.7577064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7577156Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7577278Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7577818Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7577878Z graph_break [] 2025-12-04T10:11:57.7577958Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7578255Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7578330Z Traceback (most recent call last): 2025-12-04T10:11:57.7578627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7578692Z method(*args, **kwargs) 2025-12-04T10:11:57.7578980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7579051Z method(*args, **kwargs) 2025-12-04T10:11:57.7579338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7579464Z with policy(): 2025-12-04T10:11:57.7579759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7579825Z raise RuntimeError(msg) 2025-12-04T10:11:57.7580668Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7580676Z 2025-12-04T10:11:57.7580799Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7581310Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7581314Z 2025-12-04T10:11:57.7581473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7581595Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7581686Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7582227Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7582349Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7582410Z graph_break [] 2025-12-04T10:11:57.7582531Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7583223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7583291Z if out == self.unknown_value: 2025-12-04T10:11:57.7583410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7583505Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7583667Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7584202Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7584261Z graph_break [] 2025-12-04T10:11:57.7584380Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7584469Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7584591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7585123Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7585184Z graph_break [] 2025-12-04T10:11:57.7585672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml - 2025-12-04T10:11:57.7585777Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7587123Z FAILED [0.4954s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7587128Z 2025-12-04T10:11:57.7587254Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7587806Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7587811Z 2025-12-04T10:11:57.7587963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7588070Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7588184Z ================== 1 failed, 57 deselected, 2 rerun in 12.19s ================== 2025-12-04T10:11:57.7588242Z Got exit code 1 2025-12-04T10:11:57.7588305Z Retrying single test... 2025-12-04T10:11:57.7588582Z W1204 09:52:58.559000 53192 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7588972Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml 2025-12-04T10:11:57.7589066Z ============================= test session starts ============================== 2025-12-04T10:11:57.7589276Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7589339Z cachedir: .pytest_cache 2025-12-04T10:11:57.7589643Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7589721Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7589783Z configfile: pytest.ini 2025-12-04T10:11:57.7590092Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7590228Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7590794Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7590905Z Running 1 items in this shard 2025-12-04T10:11:57.7590908Z 2025-12-04T10:11:57.7591632Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:53:00.167238164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7591636Z 2025-12-04T10:11:57.7591933Z [W1204 09:53:09.460927240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7591937Z 2025-12-04T10:11:57.7592227Z [W1204 09:53:09.461197744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7592230Z 2025-12-04T10:11:57.7592518Z [W1204 09:53:09.467883116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7592528Z 2025-12-04T10:11:57.7592814Z [W1204 09:53:09.468499537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7592817Z 2025-12-04T10:11:57.7593105Z [W1204 09:53:09.468686800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7593109Z 2025-12-04T10:11:57.7593397Z [W1204 09:53:09.474235973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7593401Z 2025-12-04T10:11:57.7593769Z [W1204 09:53:09.474777022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7593772Z 2025-12-04T10:11:57.7594074Z [W1204 09:53:09.474937585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7594110Z 2025-12-04T10:11:57.7594192Z ('RERUN', {'yellow': True}) [11.2609s] [100%] 2025-12-04T10:11:57.7594915Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:53:10.286934849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7594919Z 2025-12-04T10:11:57.7595205Z [W1204 09:53:10.287449728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7595210Z 2025-12-04T10:11:57.7595504Z [W1204 09:53:10.287594960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7595507Z 2025-12-04T10:11:57.7595793Z [W1204 09:53:10.290577620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7595799Z 2025-12-04T10:11:57.7596083Z [W1204 09:53:10.291043678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7596092Z 2025-12-04T10:11:57.7596376Z [W1204 09:53:10.291182290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7596380Z 2025-12-04T10:11:57.7596664Z [W1204 09:53:10.295685696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7596667Z 2025-12-04T10:11:57.7596956Z [W1204 09:53:10.296144103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7596959Z 2025-12-04T10:11:57.7597245Z [W1204 09:53:10.296281516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7597250Z 2025-12-04T10:11:57.7597368Z ('RERUN', {'yellow': True}) [0.5062s] [100%] 2025-12-04T10:11:57.7598088Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:53:10.789865098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7598091Z 2025-12-04T10:11:57.7598382Z [W1204 09:53:10.790402997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7598386Z 2025-12-04T10:11:57.7598675Z [W1204 09:53:10.790549509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7598677Z 2025-12-04T10:11:57.7598963Z [W1204 09:53:10.793484339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7598971Z 2025-12-04T10:11:57.7599257Z [W1204 09:53:10.793939006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7599260Z 2025-12-04T10:11:57.7599543Z [W1204 09:53:10.794078679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7599546Z 2025-12-04T10:11:57.7599834Z [W1204 09:53:10.798646965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7599837Z 2025-12-04T10:11:57.7600243Z [W1204 09:53:10.799110323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7600247Z 2025-12-04T10:11:57.7600539Z [W1204 09:53:10.799248985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7600542Z 2025-12-04T10:11:57.7600636Z FAILED [0.5010s] [100%] 2025-12-04T10:11:57.7600641Z 2025-12-04T10:11:57.7600724Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7601014Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7601086Z Traceback (most recent call last): 2025-12-04T10:11:57.7601395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7601460Z method(*args, **kwargs) 2025-12-04T10:11:57.7601754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7601819Z method(*args, **kwargs) 2025-12-04T10:11:57.7602107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7602167Z with policy(): 2025-12-04T10:11:57.7602460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7602527Z raise RuntimeError(msg) 2025-12-04T10:11:57.7603329Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7603334Z 2025-12-04T10:11:57.7603458Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7603977Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7603980Z 2025-12-04T10:11:57.7604136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7604301Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7604393Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7604938Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7605065Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7605121Z graph_break [] 2025-12-04T10:11:57.7605246Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7605941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7606012Z if out == self.unknown_value: 2025-12-04T10:11:57.7606304Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7606387Z Traceback (most recent call last): 2025-12-04T10:11:57.7606689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7606760Z method(*args, **kwargs) 2025-12-04T10:11:57.7607047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7607111Z method(*args, **kwargs) 2025-12-04T10:11:57.7607465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7607523Z with policy(): 2025-12-04T10:11:57.7607819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7607917Z raise RuntimeError(msg) 2025-12-04T10:11:57.7608729Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7608734Z 2025-12-04T10:11:57.7608856Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7609369Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7609373Z 2025-12-04T10:11:57.7609534Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7609656Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7609756Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7610295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7610421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7610479Z graph_break [] 2025-12-04T10:11:57.7610599Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7611293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7611362Z if out == self.unknown_value: 2025-12-04T10:11:57.7611484Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7611617Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7611740Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7612281Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7612337Z graph_break [] 2025-12-04T10:11:57.7612419Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7612711Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7612782Z Traceback (most recent call last): 2025-12-04T10:11:57.7613079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7613144Z method(*args, **kwargs) 2025-12-04T10:11:57.7613435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7613499Z method(*args, **kwargs) 2025-12-04T10:11:57.7613787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7613848Z with policy(): 2025-12-04T10:11:57.7614145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7614210Z raise RuntimeError(msg) 2025-12-04T10:11:57.7615089Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7615126Z 2025-12-04T10:11:57.7615251Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7615768Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7615776Z 2025-12-04T10:11:57.7615938Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7616062Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7616163Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7616700Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7616823Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7616887Z graph_break [] 2025-12-04T10:11:57.7617139Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7617835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7617900Z if out == self.unknown_value: 2025-12-04T10:11:57.7618023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7618118Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7618238Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7618776Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7618998Z graph_break [] 2025-12-04T10:11:57.7619120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7619211Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7619331Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7619863Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7619928Z graph_break [] 2025-12-04T10:11:57.7620411Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml - 2025-12-04T10:11:57.7620515Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7621799Z FAILED [0.5010s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7621804Z 2025-12-04T10:11:57.7621934Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7622541Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7622587Z 2025-12-04T10:11:57.7622744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7622850Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7622976Z ================== 1 failed, 57 deselected, 2 rerun in 12.29s ================== 2025-12-04T10:11:57.7623038Z Got exit code 1 2025-12-04T10:11:57.7623514Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7623754Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7624019Z W1204 09:53:17.444000 53386 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7624404Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml 2025-12-04T10:11:57.7624508Z ============================= test session starts ============================== 2025-12-04T10:11:57.7624715Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7624779Z cachedir: .pytest_cache 2025-12-04T10:11:57.7625090Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7625164Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7625231Z configfile: pytest.ini 2025-12-04T10:11:57.7625545Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7625670Z collecting ... collected 58 items / 34 deselected / 24 selected 2025-12-04T10:11:57.7625759Z stepcurrent: skipping 34 already run items. 2025-12-04T10:11:57.7625825Z Running 24 items in this shard 2025-12-04T10:11:57.7625829Z 2025-12-04T10:11:57.7626326Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8972s] [ 4%] 2025-12-04T10:11:57.7626852Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4919s] [ 4%] 2025-12-04T10:11:57.7627291Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4748s] [ 4%] 2025-12-04T10:11:57.7627294Z 2025-12-04T10:11:57.7627379Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7627668Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7627743Z Traceback (most recent call last): 2025-12-04T10:11:57.7628068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7628133Z method(*args, **kwargs) 2025-12-04T10:11:57.7628442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7628504Z method(*args, **kwargs) 2025-12-04T10:11:57.7628799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7628856Z with policy(): 2025-12-04T10:11:57.7629222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7629291Z raise RuntimeError(msg) 2025-12-04T10:11:57.7630100Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7630139Z 2025-12-04T10:11:57.7630272Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7630794Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7630798Z 2025-12-04T10:11:57.7630957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7631092Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7631185Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7631539Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7631667Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7631725Z graph_break [] 2025-12-04T10:11:57.7632018Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7632090Z Traceback (most recent call last): 2025-12-04T10:11:57.7632384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7632449Z method(*args, **kwargs) 2025-12-04T10:11:57.7632742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7632818Z method(*args, **kwargs) 2025-12-04T10:11:57.7633108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7633167Z with policy(): 2025-12-04T10:11:57.7633463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7633567Z raise RuntimeError(msg) 2025-12-04T10:11:57.7634390Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7634393Z 2025-12-04T10:11:57.7634520Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7635035Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7635042Z 2025-12-04T10:11:57.7635196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7635324Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7635417Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7635762Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7635884Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7635945Z graph_break [] 2025-12-04T10:11:57.7636069Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7636227Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7636348Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7636689Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7636804Z graph_break [] 2025-12-04T10:11:57.7636888Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7637177Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7637249Z Traceback (most recent call last): 2025-12-04T10:11:57.7637556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7637621Z method(*args, **kwargs) 2025-12-04T10:11:57.7637922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7637983Z method(*args, **kwargs) 2025-12-04T10:11:57.7638272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7638330Z with policy(): 2025-12-04T10:11:57.7638625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7638692Z raise RuntimeError(msg) 2025-12-04T10:11:57.7639508Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7639512Z 2025-12-04T10:11:57.7639638Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7640230Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7640234Z 2025-12-04T10:11:57.7640392Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7640570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7640659Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7641005Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7641127Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7641187Z graph_break [] 2025-12-04T10:11:57.7641308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7641399Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7641520Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7641859Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7641918Z graph_break [] 2025-12-04T10:11:57.7642041Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7642125Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7642245Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7642582Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7642639Z graph_break [] 2025-12-04T10:11:57.7643196Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml - 2025-12-04T10:11:57.7643299Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7644594Z FAILED [0.4748s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7644652Z 2025-12-04T10:11:57.7644774Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7645295Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7645298Z 2025-12-04T10:11:57.7645451Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7645555Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7645675Z ================== 1 failed, 34 deselected, 2 rerun in 2.89s =================== 2025-12-04T10:11:57.7645732Z Got exit code 1 2025-12-04T10:11:57.7645795Z Retrying single test... 2025-12-04T10:11:57.7646060Z W1204 09:53:27.123000 53574 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7646450Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml 2025-12-04T10:11:57.7646546Z ============================= test session starts ============================== 2025-12-04T10:11:57.7646756Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7646826Z cachedir: .pytest_cache 2025-12-04T10:11:57.7647136Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7647249Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7647315Z configfile: pytest.ini 2025-12-04T10:11:57.7647630Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7647757Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7648327Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7648396Z Running 1 items in this shard 2025-12-04T10:11:57.7648399Z 2025-12-04T10:11:57.7649134Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:28.212665546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7649141Z 2025-12-04T10:11:57.7649439Z [W1204 09:53:37.279430147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7649443Z 2025-12-04T10:11:57.7649731Z [W1204 09:53:37.279682901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7649735Z 2025-12-04T10:11:57.7650041Z [W1204 09:53:37.285621283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7650045Z 2025-12-04T10:11:57.7650401Z [W1204 09:53:37.286209653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7650408Z 2025-12-04T10:11:57.7650705Z [W1204 09:53:37.286380036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7650743Z 2025-12-04T10:11:57.7651034Z [W1204 09:53:37.291869150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7651037Z 2025-12-04T10:11:57.7651329Z [W1204 09:53:37.292415789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7651332Z 2025-12-04T10:11:57.7651615Z [W1204 09:53:37.292574992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7651619Z 2025-12-04T10:11:57.7651699Z ('RERUN', {'yellow': True}) [10.9651s] [100%] 2025-12-04T10:11:57.7652419Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:38.501420760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7652426Z 2025-12-04T10:11:57.7652718Z [W1204 09:53:38.501939169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7652721Z 2025-12-04T10:11:57.7653008Z [W1204 09:53:38.502080992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7653011Z 2025-12-04T10:11:57.7653295Z [W1204 09:53:38.504921060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7653302Z 2025-12-04T10:11:57.7653589Z [W1204 09:53:38.505466709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7653592Z 2025-12-04T10:11:57.7653884Z [W1204 09:53:38.505606852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7653888Z 2025-12-04T10:11:57.7654213Z [W1204 09:53:38.510038907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7654216Z 2025-12-04T10:11:57.7654500Z [W1204 09:53:38.510495595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7654503Z 2025-12-04T10:11:57.7654795Z [W1204 09:53:38.510631928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7654799Z 2025-12-04T10:11:57.7654875Z ('RERUN', {'yellow': True}) [0.4517s] [100%] 2025-12-04T10:11:57.7655602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:39.952340525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7655607Z 2025-12-04T10:11:57.7655896Z [W1204 09:53:39.952869494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7655899Z 2025-12-04T10:11:57.7656187Z [W1204 09:53:39.953008606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7656189Z 2025-12-04T10:11:57.7656474Z [W1204 09:53:39.955927736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7656477Z 2025-12-04T10:11:57.7656836Z [W1204 09:53:39.956492516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7656843Z 2025-12-04T10:11:57.7657132Z [W1204 09:53:39.956632588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7657170Z 2025-12-04T10:11:57.7657460Z [W1204 09:53:39.961202066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7657463Z 2025-12-04T10:11:57.7657755Z [W1204 09:53:39.961663684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7657758Z 2025-12-04T10:11:57.7658054Z [W1204 09:53:39.961799887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7658057Z 2025-12-04T10:11:57.7658122Z FAILED [0.4439s] [100%] 2025-12-04T10:11:57.7658125Z 2025-12-04T10:11:57.7658211Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7658505Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7658582Z Traceback (most recent call last): 2025-12-04T10:11:57.7658887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7658959Z method(*args, **kwargs) 2025-12-04T10:11:57.7659261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7659327Z method(*args, **kwargs) 2025-12-04T10:11:57.7659619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7659677Z with policy(): 2025-12-04T10:11:57.7659977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7660044Z raise RuntimeError(msg) 2025-12-04T10:11:57.7660845Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7660888Z 2025-12-04T10:11:57.7661021Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7661539Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7661542Z 2025-12-04T10:11:57.7661707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7661834Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7661927Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7662276Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7662404Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7662466Z graph_break [] 2025-12-04T10:11:57.7662590Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7663278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7663360Z if out == self.unknown_value: 2025-12-04T10:11:57.7663719Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7663795Z Traceback (most recent call last): 2025-12-04T10:11:57.7664088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7664185Z method(*args, **kwargs) 2025-12-04T10:11:57.7664476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7664539Z method(*args, **kwargs) 2025-12-04T10:11:57.7664826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7664891Z with policy(): 2025-12-04T10:11:57.7665180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7665247Z raise RuntimeError(msg) 2025-12-04T10:11:57.7666066Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7666070Z 2025-12-04T10:11:57.7666196Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7666714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7666717Z 2025-12-04T10:11:57.7666875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7667002Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7667093Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7667439Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7667566Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7667624Z graph_break [] 2025-12-04T10:11:57.7667749Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7668494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7668563Z if out == self.unknown_value: 2025-12-04T10:11:57.7668685Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7668775Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7668899Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7669244Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7669301Z graph_break [] 2025-12-04T10:11:57.7669387Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7669677Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7669747Z Traceback (most recent call last): 2025-12-04T10:11:57.7670043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7670105Z method(*args, **kwargs) 2025-12-04T10:11:57.7670398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7670458Z method(*args, **kwargs) 2025-12-04T10:11:57.7670837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7670901Z with policy(): 2025-12-04T10:11:57.7671194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7671297Z raise RuntimeError(msg) 2025-12-04T10:11:57.7672109Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7672114Z 2025-12-04T10:11:57.7672236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7672759Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7672763Z 2025-12-04T10:11:57.7672916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7673042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7673134Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7673474Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7673600Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7673656Z graph_break [] 2025-12-04T10:11:57.7673781Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7674470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7674538Z if out == self.unknown_value: 2025-12-04T10:11:57.7674665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7674791Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7674913Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7675253Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7675310Z graph_break [] 2025-12-04T10:11:57.7675434Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7675522Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7675641Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7675984Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7676043Z graph_break [] 2025-12-04T10:11:57.7676532Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml - 2025-12-04T10:11:57.7676633Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7677991Z FAILED [0.4439s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7677999Z 2025-12-04T10:11:57.7678121Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7678638Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7678679Z 2025-12-04T10:11:57.7678833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7678937Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7679056Z ================== 1 failed, 57 deselected, 2 rerun in 11.88s ================== 2025-12-04T10:11:57.7679117Z Got exit code 1 2025-12-04T10:11:57.7679183Z Retrying single test... 2025-12-04T10:11:57.7679452Z W1204 09:53:45.590000 53767 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7679839Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml 2025-12-04T10:11:57.7679974Z ============================= test session starts ============================== 2025-12-04T10:11:57.7680182Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7680248Z cachedir: .pytest_cache 2025-12-04T10:11:57.7680558Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7680637Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7680702Z configfile: pytest.ini 2025-12-04T10:11:57.7681020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7681147Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7681721Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7681790Z Running 1 items in this shard 2025-12-04T10:11:57.7681837Z 2025-12-04T10:11:57.7682564Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:46.670123499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7682572Z 2025-12-04T10:11:57.7682871Z [W1204 09:53:55.804668960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7682874Z 2025-12-04T10:11:57.7683169Z [W1204 09:53:55.804924524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7683173Z 2025-12-04T10:11:57.7683462Z [W1204 09:53:55.810853006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7683465Z 2025-12-04T10:11:57.7683753Z [W1204 09:53:55.811452706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7683758Z 2025-12-04T10:11:57.7684045Z [W1204 09:53:55.811627489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7684049Z 2025-12-04T10:11:57.7684333Z [W1204 09:53:55.817078283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7684336Z 2025-12-04T10:11:57.7688529Z [W1204 09:53:55.817594412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7688650Z 2025-12-04T10:11:57.7689006Z [W1204 09:53:55.817753454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7689011Z 2025-12-04T10:11:57.7689103Z ('RERUN', {'yellow': True}) [11.0251s] [100%] 2025-12-04T10:11:57.7689894Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:57.031568185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7689899Z 2025-12-04T10:11:57.7690213Z [W1204 09:53:57.032099774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7690217Z 2025-12-04T10:11:57.7690517Z [W1204 09:53:57.032244837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7690524Z 2025-12-04T10:11:57.7690810Z [W1204 09:53:57.035163797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7690819Z 2025-12-04T10:11:57.7691103Z [W1204 09:53:57.035719727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7691111Z 2025-12-04T10:11:57.7691403Z [W1204 09:53:57.035861829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7691407Z 2025-12-04T10:11:57.7691703Z [W1204 09:53:57.040352536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7691706Z 2025-12-04T10:11:57.7691991Z [W1204 09:53:57.040817224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7691994Z 2025-12-04T10:11:57.7692296Z [W1204 09:53:57.040955767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7692300Z 2025-12-04T10:11:57.7692382Z ('RERUN', {'yellow': True}) [0.4530s] [100%] 2025-12-04T10:11:57.7693121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:57.481780307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7693162Z 2025-12-04T10:11:57.7693449Z [W1204 09:53:57.482312086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7693453Z 2025-12-04T10:11:57.7693740Z [W1204 09:53:57.482459849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7693746Z 2025-12-04T10:11:57.7694028Z [W1204 09:53:57.485346178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7694032Z 2025-12-04T10:11:57.7694317Z [W1204 09:53:57.485905158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7694326Z 2025-12-04T10:11:57.7694609Z [W1204 09:53:57.486047200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7694613Z 2025-12-04T10:11:57.7694894Z [W1204 09:53:57.490522096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7694897Z 2025-12-04T10:11:57.7695183Z [W1204 09:53:57.490991504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7695186Z 2025-12-04T10:11:57.7695550Z [W1204 09:53:57.491130707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7695554Z 2025-12-04T10:11:57.7695622Z FAILED [0.4479s] [100%] 2025-12-04T10:11:57.7695626Z 2025-12-04T10:11:57.7695751Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7696058Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7696138Z Traceback (most recent call last): 2025-12-04T10:11:57.7696454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7696524Z method(*args, **kwargs) 2025-12-04T10:11:57.7696820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7696883Z method(*args, **kwargs) 2025-12-04T10:11:57.7697176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7697237Z with policy(): 2025-12-04T10:11:57.7697542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7697614Z raise RuntimeError(msg) 2025-12-04T10:11:57.7698420Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7698424Z 2025-12-04T10:11:57.7698560Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7699086Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7699090Z 2025-12-04T10:11:57.7699253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7699389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7699527Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7699883Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7700011Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7700078Z graph_break [] 2025-12-04T10:11:57.7700213Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7700917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7700996Z if out == self.unknown_value: 2025-12-04T10:11:57.7701295Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7701378Z Traceback (most recent call last): 2025-12-04T10:11:57.7701688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7701752Z method(*args, **kwargs) 2025-12-04T10:11:57.7702046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7702107Z method(*args, **kwargs) 2025-12-04T10:11:57.7702394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7702549Z with policy(): 2025-12-04T10:11:57.7702843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7702913Z raise RuntimeError(msg) 2025-12-04T10:11:57.7703740Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7703780Z 2025-12-04T10:11:57.7703911Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7704435Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7704438Z 2025-12-04T10:11:57.7704600Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7704731Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7704827Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7705178Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7705309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7705366Z graph_break [] 2025-12-04T10:11:57.7705491Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7706177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7706248Z if out == self.unknown_value: 2025-12-04T10:11:57.7706370Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7706459Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7706582Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7706973Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7707036Z graph_break [] 2025-12-04T10:11:57.7707129Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7707420Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7707493Z Traceback (most recent call last): 2025-12-04T10:11:57.7707798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7707863Z method(*args, **kwargs) 2025-12-04T10:11:57.7708158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7708219Z method(*args, **kwargs) 2025-12-04T10:11:57.7708510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7708571Z with policy(): 2025-12-04T10:11:57.7708864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7708927Z raise RuntimeError(msg) 2025-12-04T10:11:57.7709811Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7709816Z 2025-12-04T10:11:57.7709945Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7710468Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7710507Z 2025-12-04T10:11:57.7710667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7710801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7710896Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7711245Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7711373Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7711436Z graph_break [] 2025-12-04T10:11:57.7711561Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7712268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7712339Z if out == self.unknown_value: 2025-12-04T10:11:57.7712469Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7712558Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7712680Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7713024Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7713085Z graph_break [] 2025-12-04T10:11:57.7713210Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7713295Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7713413Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7713792Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7713849Z graph_break [] 2025-12-04T10:11:57.7714346Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml - 2025-12-04T10:11:57.7714457Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7715759Z FAILED [0.4479s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7715770Z 2025-12-04T10:11:57.7715895Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7716424Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7716428Z 2025-12-04T10:11:57.7716588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7716691Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7716879Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.7716939Z Got exit code 1 2025-12-04T10:11:57.7717636Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7717971Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7718238Z W1204 09:54:04.110000 53960 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7718631Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml 2025-12-04T10:11:57.7718728Z ============================= test session starts ============================== 2025-12-04T10:11:57.7718940Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7719014Z cachedir: .pytest_cache 2025-12-04T10:11:57.7719322Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7719400Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7719471Z configfile: pytest.ini 2025-12-04T10:11:57.7719790Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7719974Z collecting ... collected 58 items / 35 deselected / 23 selected 2025-12-04T10:11:57.7720063Z stepcurrent: skipping 35 already run items. 2025-12-04T10:11:57.7720133Z Running 23 items in this shard 2025-12-04T10:11:57.7720137Z 2025-12-04T10:11:57.7720635Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8425s] [ 4%] 2025-12-04T10:11:57.7721114Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4579s] [ 4%] 2025-12-04T10:11:57.7721558Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4541s] [ 4%] 2025-12-04T10:11:57.7721625Z 2025-12-04T10:11:57.7721710Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7721999Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7722076Z Traceback (most recent call last): 2025-12-04T10:11:57.7722384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7722451Z method(*args, **kwargs) 2025-12-04T10:11:57.7722746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7722809Z method(*args, **kwargs) 2025-12-04T10:11:57.7723096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7723158Z with policy(): 2025-12-04T10:11:57.7723452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7723516Z raise RuntimeError(msg) 2025-12-04T10:11:57.7724307Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7724311Z 2025-12-04T10:11:57.7724537Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7725065Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7725102Z 2025-12-04T10:11:57.7725269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7725401Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7725495Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7725847Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7725972Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7726032Z graph_break [] 2025-12-04T10:11:57.7726321Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7726396Z Traceback (most recent call last): 2025-12-04T10:11:57.7726696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7726764Z method(*args, **kwargs) 2025-12-04T10:11:57.7727053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7727119Z method(*args, **kwargs) 2025-12-04T10:11:57.7727404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7727469Z with policy(): 2025-12-04T10:11:57.7727767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7727832Z raise RuntimeError(msg) 2025-12-04T10:11:57.7728647Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7728706Z 2025-12-04T10:11:57.7728837Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7729357Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7729361Z 2025-12-04T10:11:57.7729518Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7729646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7729740Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7730087Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7730212Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7730270Z graph_break [] 2025-12-04T10:11:57.7730395Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7730484Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7730603Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7730941Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7730998Z graph_break [] 2025-12-04T10:11:57.7731082Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7731438Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7731512Z Traceback (most recent call last): 2025-12-04T10:11:57.7731815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7732000Z method(*args, **kwargs) 2025-12-04T10:11:57.7732290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7732355Z method(*args, **kwargs) 2025-12-04T10:11:57.7732640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7732696Z with policy(): 2025-12-04T10:11:57.7732991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7733059Z raise RuntimeError(msg) 2025-12-04T10:11:57.7733862Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7733868Z 2025-12-04T10:11:57.7733995Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7734509Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7734517Z 2025-12-04T10:11:57.7734671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7734856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7735024Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7735572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7735707Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7735771Z graph_break [] 2025-12-04T10:11:57.7735960Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7736053Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7736177Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7736520Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7736591Z graph_break [] 2025-12-04T10:11:57.7736717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7736807Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7736931Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7737267Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7737334Z graph_break [] 2025-12-04T10:11:57.7737833Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml - 2025-12-04T10:11:57.7737933Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7739320Z FAILED [0.4541s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7739327Z 2025-12-04T10:11:57.7739456Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7740019Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7740023Z 2025-12-04T10:11:57.7740179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7740285Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7740399Z ================== 1 failed, 35 deselected, 2 rerun in 2.78s =================== 2025-12-04T10:11:57.7740457Z Got exit code 1 2025-12-04T10:11:57.7740525Z Retrying single test... 2025-12-04T10:11:57.7740792Z W1204 09:54:13.734000 54141 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7741182Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml 2025-12-04T10:11:57.7741276Z ============================= test session starts ============================== 2025-12-04T10:11:57.7741483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7741550Z cachedir: .pytest_cache 2025-12-04T10:11:57.7741853Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7741938Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7742008Z configfile: pytest.ini 2025-12-04T10:11:57.7742324Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7742455Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7743021Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7743131Z Running 1 items in this shard 2025-12-04T10:11:57.7743135Z 2025-12-04T10:11:57.7743860Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:14.782817801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7743864Z 2025-12-04T10:11:57.7744163Z [W1204 09:54:23.870076425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7744167Z 2025-12-04T10:11:57.7744459Z [W1204 09:54:23.870336260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7744462Z 2025-12-04T10:11:57.7744749Z [W1204 09:54:23.876050747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7744755Z 2025-12-04T10:11:57.7745041Z [W1204 09:54:23.876642597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7745044Z 2025-12-04T10:11:57.7745330Z [W1204 09:54:23.876816911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7745333Z 2025-12-04T10:11:57.7745622Z [W1204 09:54:23.882316554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7745626Z 2025-12-04T10:11:57.7745976Z [W1204 09:54:23.882833973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7745979Z 2025-12-04T10:11:57.7746264Z [W1204 09:54:23.882994686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7746305Z 2025-12-04T10:11:57.7746386Z ('RERUN', {'yellow': True}) [10.9347s] [100%] 2025-12-04T10:11:57.7747102Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:25.045640005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7747106Z 2025-12-04T10:11:57.7747409Z [W1204 09:54:25.046183244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7747412Z 2025-12-04T10:11:57.7747702Z [W1204 09:54:25.046326437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7747706Z 2025-12-04T10:11:57.7747993Z [W1204 09:54:25.049214997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7747999Z 2025-12-04T10:11:57.7748293Z [W1204 09:54:25.049761356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7748296Z 2025-12-04T10:11:57.7748583Z [W1204 09:54:25.049899748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7748588Z 2025-12-04T10:11:57.7748871Z [W1204 09:54:25.054350355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7748875Z 2025-12-04T10:11:57.7749163Z [W1204 09:54:25.054808642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7749166Z 2025-12-04T10:11:57.7749448Z [W1204 09:54:25.054944025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7749487Z 2025-12-04T10:11:57.7749565Z ('RERUN', {'yellow': True}) [0.4078s] [100%] 2025-12-04T10:11:57.7750279Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:25.451107691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7750283Z 2025-12-04T10:11:57.7750567Z [W1204 09:54:25.451667841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7750570Z 2025-12-04T10:11:57.7750862Z [W1204 09:54:25.451813523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7750866Z 2025-12-04T10:11:57.7751151Z [W1204 09:54:25.454667302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7751157Z 2025-12-04T10:11:57.7751445Z [W1204 09:54:25.455209571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7751448Z 2025-12-04T10:11:57.7751734Z [W1204 09:54:25.455348283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7751737Z 2025-12-04T10:11:57.7752025Z [W1204 09:54:25.459780559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7752029Z 2025-12-04T10:11:57.7752381Z [W1204 09:54:25.460250507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7752384Z 2025-12-04T10:11:57.7752687Z [W1204 09:54:25.460400830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7752742Z 2025-12-04T10:11:57.7752806Z FAILED [0.4003s] [100%] 2025-12-04T10:11:57.7752810Z 2025-12-04T10:11:57.7752894Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7753183Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7753255Z Traceback (most recent call last): 2025-12-04T10:11:57.7753574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7753642Z method(*args, **kwargs) 2025-12-04T10:11:57.7753938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7754001Z method(*args, **kwargs) 2025-12-04T10:11:57.7754295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7754355Z with policy(): 2025-12-04T10:11:57.7754656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7754720Z raise RuntimeError(msg) 2025-12-04T10:11:57.7755519Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7755525Z 2025-12-04T10:11:57.7755654Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7756175Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7756182Z 2025-12-04T10:11:57.7756342Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7756510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7756608Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7756954Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7757079Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7757141Z graph_break [] 2025-12-04T10:11:57.7757263Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7757961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7758034Z if out == self.unknown_value: 2025-12-04T10:11:57.7758329Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7758405Z Traceback (most recent call last): 2025-12-04T10:11:57.7758702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7758766Z method(*args, **kwargs) 2025-12-04T10:11:57.7759055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7759115Z method(*args, **kwargs) 2025-12-04T10:11:57.7759468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7759526Z with policy(): 2025-12-04T10:11:57.7759817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7760015Z raise RuntimeError(msg) 2025-12-04T10:11:57.7760816Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7760822Z 2025-12-04T10:11:57.7760953Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7761474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7761478Z 2025-12-04T10:11:57.7761635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7761758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7761854Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7763911Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7764100Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7764163Z graph_break [] 2025-12-04T10:11:57.7764307Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7765021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7765094Z if out == self.unknown_value: 2025-12-04T10:11:57.7765225Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7765326Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7765520Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7765899Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7765963Z graph_break [] 2025-12-04T10:11:57.7766054Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7766350Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7766427Z Traceback (most recent call last): 2025-12-04T10:11:57.7766744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7766812Z method(*args, **kwargs) 2025-12-04T10:11:57.7767109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7767175Z method(*args, **kwargs) 2025-12-04T10:11:57.7767464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7767528Z with policy(): 2025-12-04T10:11:57.7767821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7767887Z raise RuntimeError(msg) 2025-12-04T10:11:57.7768742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7768747Z 2025-12-04T10:11:57.7768879Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7769435Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7769440Z 2025-12-04T10:11:57.7769604Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7769736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7769830Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7770174Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7770304Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7770362Z graph_break [] 2025-12-04T10:11:57.7770484Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7771190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7771324Z if out == self.unknown_value: 2025-12-04T10:11:57.7771457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7771551Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7771677Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7772024Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7772081Z graph_break [] 2025-12-04T10:11:57.7772206Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7772294Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7772452Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7772806Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7772866Z graph_break [] 2025-12-04T10:11:57.7773366Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml - 2025-12-04T10:11:57.7773476Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7774762Z FAILED [0.4003s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7774773Z 2025-12-04T10:11:57.7774902Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7775419Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7775423Z 2025-12-04T10:11:57.7775583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7775721Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7775853Z ================== 1 failed, 57 deselected, 2 rerun in 11.77s ================== 2025-12-04T10:11:57.7775913Z Got exit code 1 2025-12-04T10:11:57.7775977Z Retrying single test... 2025-12-04T10:11:57.7776283Z W1204 09:54:32.096000 54327 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7776673Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml 2025-12-04T10:11:57.7776769Z ============================= test session starts ============================== 2025-12-04T10:11:57.7776984Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7777056Z cachedir: .pytest_cache 2025-12-04T10:11:57.7777362Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7777443Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7777511Z configfile: pytest.ini 2025-12-04T10:11:57.7777826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7777965Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7778579Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7778654Z Running 1 items in this shard 2025-12-04T10:11:57.7778657Z 2025-12-04T10:11:57.7779386Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:33.144211969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7779390Z 2025-12-04T10:11:57.7779690Z [W1204 09:54:42.282457396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7779694Z 2025-12-04T10:11:57.7779983Z [W1204 09:54:42.282706341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7780021Z 2025-12-04T10:11:57.7780317Z [W1204 09:54:42.289237762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7780321Z 2025-12-04T10:11:57.7780604Z [W1204 09:54:42.289815112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7780607Z 2025-12-04T10:11:57.7780896Z [W1204 09:54:42.289979845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7780900Z 2025-12-04T10:11:57.7781184Z [W1204 09:54:42.295395027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7781187Z 2025-12-04T10:11:57.7781472Z [W1204 09:54:42.295924426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7781479Z 2025-12-04T10:11:57.7781768Z [W1204 09:54:42.296088699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7781771Z 2025-12-04T10:11:57.7781854Z ('RERUN', {'yellow': True}) [10.9860s] [100%] 2025-12-04T10:11:57.7782629Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:43.460241201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7782633Z 2025-12-04T10:11:57.7782921Z [W1204 09:54:43.460803510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7782925Z 2025-12-04T10:11:57.7783251Z [W1204 09:54:43.460947613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7783256Z 2025-12-04T10:11:57.7783542Z [W1204 09:54:43.463862763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7783545Z 2025-12-04T10:11:57.7783833Z [W1204 09:54:43.464422332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7783836Z 2025-12-04T10:11:57.7784121Z [W1204 09:54:43.464560035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7784126Z 2025-12-04T10:11:57.7784410Z [W1204 09:54:43.469028751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7784416Z 2025-12-04T10:11:57.7784698Z [W1204 09:54:43.469481259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7784705Z 2025-12-04T10:11:57.7785027Z [W1204 09:54:43.469617101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7785031Z 2025-12-04T10:11:57.7785123Z ('RERUN', {'yellow': True}) [0.4078s] [100%] 2025-12-04T10:11:57.7785841Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:43.865114200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7785847Z 2025-12-04T10:11:57.7786138Z [W1204 09:54:43.865659739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7786142Z 2025-12-04T10:11:57.7786428Z [W1204 09:54:43.865803881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7786473Z 2025-12-04T10:11:57.7786760Z [W1204 09:54:43.868666770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7786763Z 2025-12-04T10:11:57.7787050Z [W1204 09:54:43.869208229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7787053Z 2025-12-04T10:11:57.7787339Z [W1204 09:54:43.869344891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7787342Z 2025-12-04T10:11:57.7787629Z [W1204 09:54:43.873815757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7787632Z 2025-12-04T10:11:57.7787924Z [W1204 09:54:43.874272465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7787932Z 2025-12-04T10:11:57.7788220Z [W1204 09:54:43.874406147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7788224Z 2025-12-04T10:11:57.7788283Z FAILED [0.4052s] [100%] 2025-12-04T10:11:57.7788287Z 2025-12-04T10:11:57.7788373Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7788663Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7788736Z Traceback (most recent call last): 2025-12-04T10:11:57.7789090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7789157Z method(*args, **kwargs) 2025-12-04T10:11:57.7789453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7789550Z method(*args, **kwargs) 2025-12-04T10:11:57.7789840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7789907Z with policy(): 2025-12-04T10:11:57.7790201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7790270Z raise RuntimeError(msg) 2025-12-04T10:11:57.7791062Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7791066Z 2025-12-04T10:11:57.7791198Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7791720Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7791726Z 2025-12-04T10:11:57.7791925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7792061Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7792157Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7792504Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7792635Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7792693Z graph_break [] 2025-12-04T10:11:57.7792825Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7793524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7793642Z if out == self.unknown_value: 2025-12-04T10:11:57.7793941Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7794015Z Traceback (most recent call last): 2025-12-04T10:11:57.7794318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7794381Z method(*args, **kwargs) 2025-12-04T10:11:57.7794669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7794733Z method(*args, **kwargs) 2025-12-04T10:11:57.7795020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7795082Z with policy(): 2025-12-04T10:11:57.7795379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7795443Z raise RuntimeError(msg) 2025-12-04T10:11:57.7796244Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7796249Z 2025-12-04T10:11:57.7796414Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7796932Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7796974Z 2025-12-04T10:11:57.7797133Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7797260Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7797355Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7797698Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7797827Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7797887Z graph_break [] 2025-12-04T10:11:57.7798013Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7798705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7798777Z if out == self.unknown_value: 2025-12-04T10:11:57.7798900Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7799037Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7799162Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7799507Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7799570Z graph_break [] 2025-12-04T10:11:57.7799655Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7799995Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7800068Z Traceback (most recent call last): 2025-12-04T10:11:57.7800370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7800476Z method(*args, **kwargs) 2025-12-04T10:11:57.7800771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7800835Z method(*args, **kwargs) 2025-12-04T10:11:57.7801122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7801179Z with policy(): 2025-12-04T10:11:57.7801476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7801540Z raise RuntimeError(msg) 2025-12-04T10:11:57.7802347Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7802354Z 2025-12-04T10:11:57.7802478Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7802994Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7803001Z 2025-12-04T10:11:57.7803156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7803325Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7803425Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7803766Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7803925Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7803987Z graph_break [] 2025-12-04T10:11:57.7804110Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7804806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7804873Z if out == self.unknown_value: 2025-12-04T10:11:57.7804995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7805087Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7805206Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7805545Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7805606Z graph_break [] 2025-12-04T10:11:57.7805726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7805856Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7805989Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7806329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7806390Z graph_break [] 2025-12-04T10:11:57.7806878Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml - 2025-12-04T10:11:57.7806980Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7808256Z FAILED [0.4052s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7808315Z 2025-12-04T10:11:57.7808442Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7808953Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7808957Z 2025-12-04T10:11:57.7809112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7809216Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7809339Z ================== 1 failed, 57 deselected, 2 rerun in 11.82s ================== 2025-12-04T10:11:57.7809398Z Got exit code 1 2025-12-04T10:11:57.7809871Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.7810111Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7810375Z W1204 09:54:50.517000 54513 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7810794Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml 2025-12-04T10:11:57.7810895Z ============================= test session starts ============================== 2025-12-04T10:11:57.7811135Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7811201Z cachedir: .pytest_cache 2025-12-04T10:11:57.7811515Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7811588Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7811651Z configfile: pytest.ini 2025-12-04T10:11:57.7811968Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7812096Z collecting ... collected 58 items / 36 deselected / 22 selected 2025-12-04T10:11:57.7812187Z stepcurrent: skipping 36 already run items. 2025-12-04T10:11:57.7812257Z Running 22 items in this shard 2025-12-04T10:11:57.7812261Z 2025-12-04T10:11:57.7812756Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9223s] [ 4%] 2025-12-04T10:11:57.7813280Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5383s] [ 4%] 2025-12-04T10:11:57.7813720Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5252s] [ 4%] 2025-12-04T10:11:57.7813723Z 2025-12-04T10:11:57.7813815Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7814103Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7814179Z Traceback (most recent call last): 2025-12-04T10:11:57.7814487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7814553Z method(*args, **kwargs) 2025-12-04T10:11:57.7814882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7814945Z method(*args, **kwargs) 2025-12-04T10:11:57.7815231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7815294Z with policy(): 2025-12-04T10:11:57.7815585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7815652Z raise RuntimeError(msg) 2025-12-04T10:11:57.7816439Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7816446Z 2025-12-04T10:11:57.7816574Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7817303Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7817308Z 2025-12-04T10:11:57.7817469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7817601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7817695Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7818288Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7818423Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7818517Z graph_break [] 2025-12-04T10:11:57.7818805Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7818878Z Traceback (most recent call last): 2025-12-04T10:11:57.7819185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7819254Z method(*args, **kwargs) 2025-12-04T10:11:57.7819542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7819607Z method(*args, **kwargs) 2025-12-04T10:11:57.7819895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7819953Z with policy(): 2025-12-04T10:11:57.7820245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7820313Z raise RuntimeError(msg) 2025-12-04T10:11:57.7821159Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7821166Z 2025-12-04T10:11:57.7821291Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7821806Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7821810Z 2025-12-04T10:11:57.7821965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7822091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7822230Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7822778Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7822904Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7822966Z graph_break [] 2025-12-04T10:11:57.7823091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7823179Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7823308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7823845Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7823910Z graph_break [] 2025-12-04T10:11:57.7823993Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7824279Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7824356Z Traceback (most recent call last): 2025-12-04T10:11:57.7824657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7824721Z method(*args, **kwargs) 2025-12-04T10:11:57.7825048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7825110Z method(*args, **kwargs) 2025-12-04T10:11:57.7825399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7825492Z with policy(): 2025-12-04T10:11:57.7825784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7825853Z raise RuntimeError(msg) 2025-12-04T10:11:57.7826654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7826658Z 2025-12-04T10:11:57.7826786Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7827302Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7827307Z 2025-12-04T10:11:57.7827463Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7827588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7827716Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7828260Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7828387Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7828446Z graph_break [] 2025-12-04T10:11:57.7828574Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7828663Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7828784Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7829320Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7829416Z graph_break [] 2025-12-04T10:11:57.7829537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7829622Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7829744Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7830281Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7830338Z graph_break [] 2025-12-04T10:11:57.7830831Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml - 2025-12-04T10:11:57.7830944Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7832224Z FAILED [0.5252s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7832264Z 2025-12-04T10:11:57.7832388Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7832904Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7832942Z 2025-12-04T10:11:57.7833097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7833202Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7833322Z ================== 1 failed, 36 deselected, 2 rerun in 3.01s =================== 2025-12-04T10:11:57.7833378Z Got exit code 1 2025-12-04T10:11:57.7833442Z Retrying single test... 2025-12-04T10:11:57.7833703Z W1204 09:55:00.200000 54695 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7834083Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml 2025-12-04T10:11:57.7834180Z ============================= test session starts ============================== 2025-12-04T10:11:57.7834386Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7834457Z cachedir: .pytest_cache 2025-12-04T10:11:57.7834760Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7834890Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7834956Z configfile: pytest.ini 2025-12-04T10:11:57.7835269Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7835398Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7835967Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7836035Z Running 1 items in this shard 2025-12-04T10:11:57.7836039Z 2025-12-04T10:11:57.7836768Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:01.781151099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7836807Z 2025-12-04T10:11:57.7837116Z [W1204 09:55:10.739943601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7837120Z 2025-12-04T10:11:57.7837412Z [W1204 09:55:10.740238936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7837416Z 2025-12-04T10:11:57.7837707Z [W1204 09:55:10.746207456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7837711Z 2025-12-04T10:11:57.7838000Z [W1204 09:55:10.746791646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7838007Z 2025-12-04T10:11:57.7838294Z [W1204 09:55:10.746951619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7838299Z 2025-12-04T10:11:57.7838584Z [W1204 09:55:10.752347129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7838591Z 2025-12-04T10:11:57.7838876Z [W1204 09:55:10.752884938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7838880Z 2025-12-04T10:11:57.7839203Z [W1204 09:55:10.753058391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7839206Z 2025-12-04T10:11:57.7839291Z ('RERUN', {'yellow': True}) [10.8960s] [100%] 2025-12-04T10:11:57.7840056Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:11.552367815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7840098Z 2025-12-04T10:11:57.7840389Z [W1204 09:55:11.552908864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7840392Z 2025-12-04T10:11:57.7840678Z [W1204 09:55:11.553053426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7840682Z 2025-12-04T10:11:57.7840971Z [W1204 09:55:11.555906215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7840975Z 2025-12-04T10:11:57.7841261Z [W1204 09:55:11.556357122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7841267Z 2025-12-04T10:11:57.7841551Z [W1204 09:55:11.556499185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7841557Z 2025-12-04T10:11:57.7841882Z [W1204 09:55:11.561060492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7841886Z 2025-12-04T10:11:57.7842173Z [W1204 09:55:11.561513349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7842176Z 2025-12-04T10:11:57.7842478Z [W1204 09:55:11.561652312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7842481Z 2025-12-04T10:11:57.7842560Z ('RERUN', {'yellow': True}) [0.4962s] [100%] 2025-12-04T10:11:57.7843277Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:12.047069967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7843318Z 2025-12-04T10:11:57.7843605Z [W1204 09:55:12.047599646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7843608Z 2025-12-04T10:11:57.7843896Z [W1204 09:55:12.047739979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7843899Z 2025-12-04T10:11:57.7844185Z [W1204 09:55:12.050620347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7844188Z 2025-12-04T10:11:57.7844476Z [W1204 09:55:12.051069204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7844482Z 2025-12-04T10:11:57.7844766Z [W1204 09:55:12.051211707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7844769Z 2025-12-04T10:11:57.7845053Z [W1204 09:55:12.055615111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7845064Z 2025-12-04T10:11:57.7845355Z [W1204 09:55:12.056061458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7845358Z 2025-12-04T10:11:57.7845757Z [W1204 09:55:12.056199000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7845760Z 2025-12-04T10:11:57.7845825Z FAILED [0.4975s] [100%] 2025-12-04T10:11:57.7845828Z 2025-12-04T10:11:57.7845910Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7846234Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7846307Z Traceback (most recent call last): 2025-12-04T10:11:57.7846616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7846684Z method(*args, **kwargs) 2025-12-04T10:11:57.7846976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7847039Z method(*args, **kwargs) 2025-12-04T10:11:57.7847330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7847391Z with policy(): 2025-12-04T10:11:57.7847684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7847751Z raise RuntimeError(msg) 2025-12-04T10:11:57.7848578Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7848586Z 2025-12-04T10:11:57.7848724Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7849245Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7849249Z 2025-12-04T10:11:57.7849411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7849537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7849634Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7850213Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7850350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7850416Z graph_break [] 2025-12-04T10:11:57.7850541Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7851232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7851305Z if out == self.unknown_value: 2025-12-04T10:11:57.7851592Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7851668Z Traceback (most recent call last): 2025-12-04T10:11:57.7851966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7852028Z method(*args, **kwargs) 2025-12-04T10:11:57.7852318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7852381Z method(*args, **kwargs) 2025-12-04T10:11:57.7852669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7852764Z with policy(): 2025-12-04T10:11:57.7853057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7853125Z raise RuntimeError(msg) 2025-12-04T10:11:57.7853922Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7853961Z 2025-12-04T10:11:57.7854089Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7854601Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7854605Z 2025-12-04T10:11:57.7854759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7854883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7854978Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7855521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7855686Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7855744Z graph_break [] 2025-12-04T10:11:57.7855868Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7856556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7856629Z if out == self.unknown_value: 2025-12-04T10:11:57.7856750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7856839Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7857002Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7857541Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7857600Z graph_break [] 2025-12-04T10:11:57.7857683Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7857973Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7858048Z Traceback (most recent call last): 2025-12-04T10:11:57.7858344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7858409Z method(*args, **kwargs) 2025-12-04T10:11:57.7858701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7858767Z method(*args, **kwargs) 2025-12-04T10:11:57.7859058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7859115Z with policy(): 2025-12-04T10:11:57.7859403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7859471Z raise RuntimeError(msg) 2025-12-04T10:11:57.7860306Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7860310Z 2025-12-04T10:11:57.7860439Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7861015Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7861021Z 2025-12-04T10:11:57.7861174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7861301Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7861389Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7861934Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7862057Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7862112Z graph_break [] 2025-12-04T10:11:57.7862237Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7862964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7863036Z if out == self.unknown_value: 2025-12-04T10:11:57.7863157Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7863246Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7863367Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7863902Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7863964Z graph_break [] 2025-12-04T10:11:57.7864120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7864206Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7864332Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7864863Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7864926Z graph_break [] 2025-12-04T10:11:57.7865427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml - 2025-12-04T10:11:57.7865526Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7866801Z FAILED [0.4975s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7866807Z 2025-12-04T10:11:57.7866931Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7867488Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7867492Z 2025-12-04T10:11:57.7867648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7867762Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7867916Z ================== 1 failed, 57 deselected, 2 rerun in 11.91s ================== 2025-12-04T10:11:57.7867977Z Got exit code 1 2025-12-04T10:11:57.7868043Z Retrying single test... 2025-12-04T10:11:57.7868313Z W1204 09:55:18.710000 54882 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7868700Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml 2025-12-04T10:11:57.7868798Z ============================= test session starts ============================== 2025-12-04T10:11:57.7869005Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7869072Z cachedir: .pytest_cache 2025-12-04T10:11:57.7869377Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7869464Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7869534Z configfile: pytest.ini 2025-12-04T10:11:57.7869849Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7870017Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7870588Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7870658Z Running 1 items in this shard 2025-12-04T10:11:57.7870662Z 2025-12-04T10:11:57.7871393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:20.303032215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7871432Z 2025-12-04T10:11:57.7871728Z [W1204 09:55:29.179541590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7871731Z 2025-12-04T10:11:57.7872026Z [W1204 09:55:29.179794475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7872029Z 2025-12-04T10:11:57.7872315Z [W1204 09:55:29.185835228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7872318Z 2025-12-04T10:11:57.7872612Z [W1204 09:55:29.186432518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7872615Z 2025-12-04T10:11:57.7872897Z [W1204 09:55:29.186598181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7872901Z 2025-12-04T10:11:57.7873192Z [W1204 09:55:29.192032814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7873196Z 2025-12-04T10:11:57.7873481Z [W1204 09:55:29.192590753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7873485Z 2025-12-04T10:11:57.7873769Z [W1204 09:55:29.192764986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7873772Z 2025-12-04T10:11:57.7873856Z ('RERUN', {'yellow': True}) [10.8225s] [100%] 2025-12-04T10:11:57.7874604Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:30.992479370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7874641Z 2025-12-04T10:11:57.7874933Z [W1204 09:55:30.993006509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7874937Z 2025-12-04T10:11:57.7875225Z [W1204 09:55:30.993145221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7875228Z 2025-12-04T10:11:57.7875516Z [W1204 09:55:30.995998550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7875520Z 2025-12-04T10:11:57.7875804Z [W1204 09:55:30.996457008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7875807Z 2025-12-04T10:11:57.7876101Z [W1204 09:55:30.996594360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7876106Z 2025-12-04T10:11:57.7876391Z [W1204 09:55:30.001141148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7876396Z 2025-12-04T10:11:57.7876721Z [W1204 09:55:30.001620056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7876729Z 2025-12-04T10:11:57.7877013Z [W1204 09:55:30.001758248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7877018Z 2025-12-04T10:11:57.7877094Z ('RERUN', {'yellow': True}) [0.4968s] [100%] 2025-12-04T10:11:57.7877811Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:30.487933356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7877816Z 2025-12-04T10:11:57.7878101Z [W1204 09:55:30.488478596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7878138Z 2025-12-04T10:11:57.7878429Z [W1204 09:55:30.488619478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7878432Z 2025-12-04T10:11:57.7878716Z [W1204 09:55:30.491505898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7878719Z 2025-12-04T10:11:57.7879008Z [W1204 09:55:30.491954685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7879012Z 2025-12-04T10:11:57.7879298Z [W1204 09:55:30.492141558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7879301Z 2025-12-04T10:11:57.7879590Z [W1204 09:55:30.496600445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7879595Z 2025-12-04T10:11:57.7879921Z [W1204 09:55:30.497052333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7879924Z 2025-12-04T10:11:57.7880210Z [W1204 09:55:30.497187816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7880213Z 2025-12-04T10:11:57.7880276Z FAILED [0.4970s] [100%] 2025-12-04T10:11:57.7880279Z 2025-12-04T10:11:57.7880399Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7880689Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7880765Z Traceback (most recent call last): 2025-12-04T10:11:57.7881073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7881178Z method(*args, **kwargs) 2025-12-04T10:11:57.7881473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7881539Z method(*args, **kwargs) 2025-12-04T10:11:57.7881827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7881886Z with policy(): 2025-12-04T10:11:57.7882189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7882257Z raise RuntimeError(msg) 2025-12-04T10:11:57.7883047Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7883058Z 2025-12-04T10:11:57.7883186Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7883753Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7883757Z 2025-12-04T10:11:57.7883922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7884049Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7884148Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7884696Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7884862Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7884923Z graph_break [] 2025-12-04T10:11:57.7885047Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7885740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7885811Z if out == self.unknown_value: 2025-12-04T10:11:57.7886100Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7886176Z Traceback (most recent call last): 2025-12-04T10:11:57.7886473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7886537Z method(*args, **kwargs) 2025-12-04T10:11:57.7886832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7886895Z method(*args, **kwargs) 2025-12-04T10:11:57.7887183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7887241Z with policy(): 2025-12-04T10:11:57.7887532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7887600Z raise RuntimeError(msg) 2025-12-04T10:11:57.7888754Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7888795Z 2025-12-04T10:11:57.7888933Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7889454Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7889457Z 2025-12-04T10:11:57.7889616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7889741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7889835Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7890388Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7890515Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7890576Z graph_break [] 2025-12-04T10:11:57.7890701Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7891427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7891501Z if out == self.unknown_value: 2025-12-04T10:11:57.7891622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7891712Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7891839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7892378Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7892475Z graph_break [] 2025-12-04T10:11:57.7892561Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7892848Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.7892926Z Traceback (most recent call last): 2025-12-04T10:11:57.7893222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7893288Z method(*args, **kwargs) 2025-12-04T10:11:57.7893576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7893641Z method(*args, **kwargs) 2025-12-04T10:11:57.7893929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7893989Z with policy(): 2025-12-04T10:11:57.7894280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7894349Z raise RuntimeError(msg) 2025-12-04T10:11:57.7895153Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7895157Z 2025-12-04T10:11:57.7895321Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7895835Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7895871Z 2025-12-04T10:11:57.7896028Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7896154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7896243Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7896786Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7896910Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7896967Z graph_break [] 2025-12-04T10:11:57.7897093Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7897778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7897854Z if out == self.unknown_value: 2025-12-04T10:11:57.7897974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7898099Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7898224Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7898760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7898822Z graph_break [] 2025-12-04T10:11:57.7898944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7899030Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7899152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7899725Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7899786Z graph_break [] 2025-12-04T10:11:57.7900275Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml - 2025-12-04T10:11:57.7900375Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7901665Z FAILED [0.4970s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7901673Z 2025-12-04T10:11:57.7901801Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7902322Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7902326Z 2025-12-04T10:11:57.7902482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7902622Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7902739Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ================== 2025-12-04T10:11:57.7902798Z Got exit code 1 2025-12-04T10:11:57.7903269Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.7903546Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7903811Z W1204 09:55:37.147000 55069 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7904193Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml 2025-12-04T10:11:57.7904287Z ============================= test session starts ============================== 2025-12-04T10:11:57.7904497Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7904563Z cachedir: .pytest_cache 2025-12-04T10:11:57.7904869Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7904949Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7905014Z configfile: pytest.ini 2025-12-04T10:11:57.7905367Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7905497Z collecting ... collected 58 items / 37 deselected / 21 selected 2025-12-04T10:11:57.7905584Z stepcurrent: skipping 37 already run items. 2025-12-04T10:11:57.7905656Z Running 21 items in this shard 2025-12-04T10:11:57.7905660Z 2025-12-04T10:11:57.7906161Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8839s] [ 4%] 2025-12-04T10:11:57.7906652Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4860s] [ 4%] 2025-12-04T10:11:57.7907105Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4938s] [ 4%] 2025-12-04T10:11:57.7907144Z 2025-12-04T10:11:57.7907232Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7907525Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7907598Z Traceback (most recent call last): 2025-12-04T10:11:57.7907913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7907978Z method(*args, **kwargs) 2025-12-04T10:11:57.7908270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7908336Z method(*args, **kwargs) 2025-12-04T10:11:57.7908626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7908692Z with policy(): 2025-12-04T10:11:57.7908989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7909053Z raise RuntimeError(msg) 2025-12-04T10:11:57.7909857Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7909895Z 2025-12-04T10:11:57.7910023Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7910543Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7910582Z 2025-12-04T10:11:57.7910741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7910869Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7910966Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7911316Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7911449Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7911509Z graph_break [] 2025-12-04T10:11:57.7911799Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7911878Z Traceback (most recent call last): 2025-12-04T10:11:57.7912169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7912238Z method(*args, **kwargs) 2025-12-04T10:11:57.7912566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7912628Z method(*args, **kwargs) 2025-12-04T10:11:57.7912918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7912975Z with policy(): 2025-12-04T10:11:57.7913268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7913347Z raise RuntimeError(msg) 2025-12-04T10:11:57.7914167Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7914225Z 2025-12-04T10:11:57.7914356Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7914875Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7914880Z 2025-12-04T10:11:57.7915036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7915163Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7915257Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7915603Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7915730Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7915789Z graph_break [] 2025-12-04T10:11:57.7915913Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7916004Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7916125Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7916464Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7916523Z graph_break [] 2025-12-04T10:11:57.7916645Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7916934Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7917155Z Traceback (most recent call last): 2025-12-04T10:11:57.7917465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7917603Z method(*args, **kwargs) 2025-12-04T10:11:57.7917896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7917959Z method(*args, **kwargs) 2025-12-04T10:11:57.7918248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7918309Z with policy(): 2025-12-04T10:11:57.7918599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7918670Z raise RuntimeError(msg) 2025-12-04T10:11:57.7919483Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7919489Z 2025-12-04T10:11:57.7919611Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7920218Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7920222Z 2025-12-04T10:11:57.7920380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7920506Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7920598Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7920937Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7921064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7921186Z graph_break [] 2025-12-04T10:11:57.7921314Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7921404Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7921523Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7921866Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7921922Z graph_break [] 2025-12-04T10:11:57.7922047Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7922138Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7922256Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7922593Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7922653Z graph_break [] 2025-12-04T10:11:57.7923139Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml - 2025-12-04T10:11:57.7923239Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7924589Z FAILED [0.4938s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7924629Z 2025-12-04T10:11:57.7924760Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7925281Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7925284Z 2025-12-04T10:11:57.7925441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7925545Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7925659Z ================== 1 failed, 37 deselected, 2 rerun in 2.89s =================== 2025-12-04T10:11:57.7925721Z Got exit code 1 2025-12-04T10:11:57.7925786Z Retrying single test... 2025-12-04T10:11:57.7926050Z W1204 09:55:46.856000 55257 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7926436Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml 2025-12-04T10:11:57.7926532Z ============================= test session starts ============================== 2025-12-04T10:11:57.7926774Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7926840Z cachedir: .pytest_cache 2025-12-04T10:11:57.7927144Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7927222Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7927287Z configfile: pytest.ini 2025-12-04T10:11:57.7927602Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7927741Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7928313Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7928424Z Running 1 items in this shard 2025-12-04T10:11:57.7928427Z 2025-12-04T10:11:57.7929160Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:55:47.936291905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7929165Z 2025-12-04T10:11:57.7929465Z [W1204 09:55:56.843829001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7929469Z 2025-12-04T10:11:57.7929758Z [W1204 09:55:56.844080485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7929763Z 2025-12-04T10:11:57.7930054Z [W1204 09:55:56.849979076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7930058Z 2025-12-04T10:11:57.7930347Z [W1204 09:55:56.850606637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7930350Z 2025-12-04T10:11:57.7930637Z [W1204 09:55:56.850781189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7930641Z 2025-12-04T10:11:57.7930962Z [W1204 09:55:56.856241443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7930966Z 2025-12-04T10:11:57.7931256Z [W1204 09:55:56.856771073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7931264Z 2025-12-04T10:11:57.7931584Z [W1204 09:55:56.856936635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7931589Z 2025-12-04T10:11:57.7931671Z ('RERUN', {'yellow': True}) [10.7931s] [100%] 2025-12-04T10:11:57.7932399Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:55:58.069861275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7932403Z 2025-12-04T10:11:57.7932696Z [W1204 09:55:58.070407264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7932699Z 2025-12-04T10:11:57.7932989Z [W1204 09:55:58.070560197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7932992Z 2025-12-04T10:11:57.7933277Z [W1204 09:55:58.073418656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7933282Z 2025-12-04T10:11:57.7933603Z [W1204 09:55:58.073963395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7933607Z 2025-12-04T10:11:57.7933892Z [W1204 09:55:58.074106357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7933895Z 2025-12-04T10:11:57.7934185Z [W1204 09:55:58.078500032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7934189Z 2025-12-04T10:11:57.7934475Z [W1204 09:55:58.078958000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7934479Z 2025-12-04T10:11:57.7934765Z [W1204 09:55:58.079096672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7934805Z 2025-12-04T10:11:57.7934886Z ('RERUN', {'yellow': True}) [0.4533s] [100%] 2025-12-04T10:11:57.7935610Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:55:58.520639124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7935615Z 2025-12-04T10:11:57.7935908Z [W1204 09:55:58.521160804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7935911Z 2025-12-04T10:11:57.7936208Z [W1204 09:55:58.521301296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7936211Z 2025-12-04T10:11:57.7936502Z [W1204 09:55:58.524125595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7936506Z 2025-12-04T10:11:57.7936793Z [W1204 09:55:58.524674664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7936797Z 2025-12-04T10:11:57.7937086Z [W1204 09:55:58.524813387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7937089Z 2025-12-04T10:11:57.7937374Z [W1204 09:55:58.529259023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7937412Z 2025-12-04T10:11:57.7937697Z [W1204 09:55:58.529724541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7937702Z 2025-12-04T10:11:57.7937985Z [W1204 09:55:58.529860963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7938024Z 2025-12-04T10:11:57.7938086Z FAILED [0.4517s] [100%] 2025-12-04T10:11:57.7938090Z 2025-12-04T10:11:57.7938177Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7938472Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7938548Z Traceback (most recent call last): 2025-12-04T10:11:57.7938855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7938921Z method(*args, **kwargs) 2025-12-04T10:11:57.7939219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7939283Z method(*args, **kwargs) 2025-12-04T10:11:57.7939569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7939634Z with policy(): 2025-12-04T10:11:57.7939983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7940054Z raise RuntimeError(msg) 2025-12-04T10:11:57.7940857Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7940864Z 2025-12-04T10:11:57.7940993Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7941518Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7941556Z 2025-12-04T10:11:57.7941715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7941844Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7941936Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7942286Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7942421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7942478Z graph_break [] 2025-12-04T10:11:57.7942604Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7943296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7943368Z if out == self.unknown_value: 2025-12-04T10:11:57.7943659Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7943732Z Traceback (most recent call last): 2025-12-04T10:11:57.7944029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7944093Z method(*args, **kwargs) 2025-12-04T10:11:57.7944384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7944486Z method(*args, **kwargs) 2025-12-04T10:11:57.7944775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7944834Z with policy(): 2025-12-04T10:11:57.7945163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7945228Z raise RuntimeError(msg) 2025-12-04T10:11:57.7946044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7946048Z 2025-12-04T10:11:57.7946173Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7946691Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7946694Z 2025-12-04T10:11:57.7946850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7946977Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7947071Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7947450Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7947580Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7947636Z graph_break [] 2025-12-04T10:11:57.7947759Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7948463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7948532Z if out == self.unknown_value: 2025-12-04T10:11:57.7948656Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7948787Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7948910Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7949256Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7949313Z graph_break [] 2025-12-04T10:11:57.7949394Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7949686Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7949756Z Traceback (most recent call last): 2025-12-04T10:11:57.7950066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7950132Z method(*args, **kwargs) 2025-12-04T10:11:57.7950425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7950491Z method(*args, **kwargs) 2025-12-04T10:11:57.7950779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7950838Z with policy(): 2025-12-04T10:11:57.7951137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7951201Z raise RuntimeError(msg) 2025-12-04T10:11:57.7952053Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7952090Z 2025-12-04T10:11:57.7952215Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7952741Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7952745Z 2025-12-04T10:11:57.7952903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7953025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7953114Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7953462Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7953586Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7953643Z graph_break [] 2025-12-04T10:11:57.7953767Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7954487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7954556Z if out == self.unknown_value: 2025-12-04T10:11:57.7954678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7954773Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7954895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7955241Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7955299Z graph_break [] 2025-12-04T10:11:57.7955421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7955630Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7955752Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7956091Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7956152Z graph_break [] 2025-12-04T10:11:57.7956640Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml - 2025-12-04T10:11:57.7956747Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7958036Z FAILED [0.4517s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7958043Z 2025-12-04T10:11:57.7958180Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7958701Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7958705Z 2025-12-04T10:11:57.7958903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7959008Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7959124Z ================== 1 failed, 57 deselected, 2 rerun in 11.72s ================== 2025-12-04T10:11:57.7959222Z Got exit code 1 2025-12-04T10:11:57.7959286Z Retrying single test... 2025-12-04T10:11:57.7959550Z W1204 09:56:05.196000 55450 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7959976Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml 2025-12-04T10:11:57.7960071Z ============================= test session starts ============================== 2025-12-04T10:11:57.7960281Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7960347Z cachedir: .pytest_cache 2025-12-04T10:11:57.7960650Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7960729Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7960793Z configfile: pytest.ini 2025-12-04T10:11:57.7961107Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7961242Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.7961853Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7961928Z Running 1 items in this shard 2025-12-04T10:11:57.7961933Z 2025-12-04T10:11:57.7962664Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:56:06.293926580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7962667Z 2025-12-04T10:11:57.7962970Z [W1204 09:56:15.296020267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7963009Z 2025-12-04T10:11:57.7963303Z [W1204 09:56:15.296269911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7963307Z 2025-12-04T10:11:57.7963598Z [W1204 09:56:15.302126701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7963602Z 2025-12-04T10:11:57.7963890Z [W1204 09:56:15.302717041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7963894Z 2025-12-04T10:11:57.7964181Z [W1204 09:56:15.302888004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7964185Z 2025-12-04T10:11:57.7964478Z [W1204 09:56:15.308270636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7964484Z 2025-12-04T10:11:57.7964781Z [W1204 09:56:15.308791585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7964786Z 2025-12-04T10:11:57.7965077Z [W1204 09:56:15.308951698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7965080Z 2025-12-04T10:11:57.7965161Z ('RERUN', {'yellow': True}) [10.9060s] [100%] 2025-12-04T10:11:57.7965921Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:56:16.524140008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7965926Z 2025-12-04T10:11:57.7966215Z [W1204 09:56:16.524681957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7966269Z 2025-12-04T10:11:57.7966560Z [W1204 09:56:16.524819159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7966565Z 2025-12-04T10:11:57.7966849Z [W1204 09:56:16.527809150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7966853Z 2025-12-04T10:11:57.7967137Z [W1204 09:56:16.528385280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7967145Z 2025-12-04T10:11:57.7967430Z [W1204 09:56:16.528523333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7967433Z 2025-12-04T10:11:57.7967718Z [W1204 09:56:16.533187112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7967724Z 2025-12-04T10:11:57.7968012Z [W1204 09:56:16.533681681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7968050Z 2025-12-04T10:11:57.7968335Z [W1204 09:56:16.533824683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7968339Z 2025-12-04T10:11:57.7968418Z ('RERUN', {'yellow': True}) [0.4553s] [100%] 2025-12-04T10:11:57.7969143Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:56:17.976013959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7969146Z 2025-12-04T10:11:57.7969435Z [W1204 09:56:17.976560149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7969473Z 2025-12-04T10:11:57.7969758Z [W1204 09:56:17.976706161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7969763Z 2025-12-04T10:11:57.7970051Z [W1204 09:56:17.979665782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7970054Z 2025-12-04T10:11:57.7970339Z [W1204 09:56:17.980253322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7970342Z 2025-12-04T10:11:57.7970632Z [W1204 09:56:17.980405175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7970639Z 2025-12-04T10:11:57.7970924Z [W1204 09:56:17.985030734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7970930Z 2025-12-04T10:11:57.7971216Z [W1204 09:56:17.985495142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7971219Z 2025-12-04T10:11:57.7971510Z [W1204 09:56:17.985632494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.7971513Z 2025-12-04T10:11:57.7971573Z FAILED [0.4493s] [100%] 2025-12-04T10:11:57.7971576Z 2025-12-04T10:11:57.7971661Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7972001Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7972079Z Traceback (most recent call last): 2025-12-04T10:11:57.7972393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7972495Z method(*args, **kwargs) 2025-12-04T10:11:57.7972793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7972857Z method(*args, **kwargs) 2025-12-04T10:11:57.7973151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7973214Z with policy(): 2025-12-04T10:11:57.7973507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7973569Z raise RuntimeError(msg) 2025-12-04T10:11:57.7974375Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7974383Z 2025-12-04T10:11:57.7974511Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7975070Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7975074Z 2025-12-04T10:11:57.7975231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7975361Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7975452Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7975801Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7975930Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7975990Z graph_break [] 2025-12-04T10:11:57.7976149Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7976864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7976934Z if out == self.unknown_value: 2025-12-04T10:11:57.7977224Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7977296Z Traceback (most recent call last): 2025-12-04T10:11:57.7977591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7977660Z method(*args, **kwargs) 2025-12-04T10:11:57.7977951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7978022Z method(*args, **kwargs) 2025-12-04T10:11:57.7978308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7978367Z with policy(): 2025-12-04T10:11:57.7978660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7978725Z raise RuntimeError(msg) 2025-12-04T10:11:57.7979582Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.7979587Z 2025-12-04T10:11:57.7979714Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7980267Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7980275Z 2025-12-04T10:11:57.7980434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7980558Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7980652Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7980997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7981125Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7981189Z graph_break [] 2025-12-04T10:11:57.7981310Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7982010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7982121Z if out == self.unknown_value: 2025-12-04T10:11:57.7982249Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7982343Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7982468Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7982811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7982872Z graph_break [] 2025-12-04T10:11:57.7982954Z =================================== FAILURES =================================== 2025-12-04T10:11:57.7983250Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.7983357Z Traceback (most recent call last): 2025-12-04T10:11:57.7983656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7983721Z method(*args, **kwargs) 2025-12-04T10:11:57.7984011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7984087Z method(*args, **kwargs) 2025-12-04T10:11:57.7984378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7984439Z with policy(): 2025-12-04T10:11:57.7984734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7984798Z raise RuntimeError(msg) 2025-12-04T10:11:57.7985608Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7985618Z 2025-12-04T10:11:57.7985743Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7986260Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7986263Z 2025-12-04T10:11:57.7986457Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7986583Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7986678Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7987056Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7987179Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7987240Z graph_break [] 2025-12-04T10:11:57.7987363Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.7988058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.7988129Z if out == self.unknown_value: 2025-12-04T10:11:57.7988253Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7988345Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7988468Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7988813Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7988910Z graph_break [] 2025-12-04T10:11:57.7989033Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.7989121Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.7989243Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.7989583Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.7989645Z graph_break [] 2025-12-04T10:11:57.7990129Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml - 2025-12-04T10:11:57.7990266Z =========================== short test summary info ============================ 2025-12-04T10:11:57.7991553Z FAILED [0.4493s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.7991558Z 2025-12-04T10:11:57.7991689Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.7992207Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7992212Z 2025-12-04T10:11:57.7992366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.7992473Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.7992591Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ================== 2025-12-04T10:11:57.7992653Z Got exit code 1 2025-12-04T10:11:57.7993127Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.7993367Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.7993688Z W1204 09:56:23.601000 55643 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.7994079Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml 2025-12-04T10:11:57.7994212Z ============================= test session starts ============================== 2025-12-04T10:11:57.7994420Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.7994486Z cachedir: .pytest_cache 2025-12-04T10:11:57.7994793Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.7994866Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.7994931Z configfile: pytest.ini 2025-12-04T10:11:57.7995246Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.7995372Z collecting ... collected 58 items / 38 deselected / 20 selected 2025-12-04T10:11:57.7995459Z stepcurrent: skipping 38 already run items. 2025-12-04T10:11:57.7995526Z Running 20 items in this shard 2025-12-04T10:11:57.7995531Z 2025-12-04T10:11:57.7996024Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8709s] [ 5%] 2025-12-04T10:11:57.7996550Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4537s] [ 5%] 2025-12-04T10:11:57.7996992Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4477s] [ 5%] 2025-12-04T10:11:57.7996995Z 2025-12-04T10:11:57.7997084Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.7997370Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.7997444Z Traceback (most recent call last): 2025-12-04T10:11:57.7997801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7997866Z method(*args, **kwargs) 2025-12-04T10:11:57.7998162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.7998224Z method(*args, **kwargs) 2025-12-04T10:11:57.7998512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.7998574Z with policy(): 2025-12-04T10:11:57.7998869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.7998934Z raise RuntimeError(msg) 2025-12-04T10:11:57.7999724Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.7999731Z 2025-12-04T10:11:57.7999860Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8000417Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8000420Z 2025-12-04T10:11:57.8000580Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8000748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8000843Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8001190Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8001355Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8001416Z graph_break [] 2025-12-04T10:11:57.8001706Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8001780Z Traceback (most recent call last): 2025-12-04T10:11:57.8002081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8002146Z method(*args, **kwargs) 2025-12-04T10:11:57.8002438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8002499Z method(*args, **kwargs) 2025-12-04T10:11:57.8002807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8002868Z with policy(): 2025-12-04T10:11:57.8003168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8003234Z raise RuntimeError(msg) 2025-12-04T10:11:57.8004066Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8004071Z 2025-12-04T10:11:57.8004200Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8004716Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8004720Z 2025-12-04T10:11:57.8004880Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8005042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8005133Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8005487Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8005609Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8005669Z graph_break [] 2025-12-04T10:11:57.8005794Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8005881Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8006006Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8006342Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8006405Z graph_break [] 2025-12-04T10:11:57.8006487Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8006772Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8006848Z Traceback (most recent call last): 2025-12-04T10:11:57.8007146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8007208Z method(*args, **kwargs) 2025-12-04T10:11:57.8007537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8007601Z method(*args, **kwargs) 2025-12-04T10:11:57.8007892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8007984Z with policy(): 2025-12-04T10:11:57.8008277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8008345Z raise RuntimeError(msg) 2025-12-04T10:11:57.8009154Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8009158Z 2025-12-04T10:11:57.8009284Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8009797Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8009801Z 2025-12-04T10:11:57.8009967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8010100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8010190Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8010572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8010696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8010753Z graph_break [] 2025-12-04T10:11:57.8010879Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8010967Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8011087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8011430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8011527Z graph_break [] 2025-12-04T10:11:57.8011651Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8011740Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8011860Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8012199Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8012255Z graph_break [] 2025-12-04T10:11:57.8012742Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml - 2025-12-04T10:11:57.8012843Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8014116Z FAILED [0.4477s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8014126Z 2025-12-04T10:11:57.8014249Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8014802Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8014806Z 2025-12-04T10:11:57.8014963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8015067Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8015220Z ================== 1 failed, 38 deselected, 2 rerun in 2.80s =================== 2025-12-04T10:11:57.8015278Z Got exit code 1 2025-12-04T10:11:57.8015342Z Retrying single test... 2025-12-04T10:11:57.8015610Z W1204 09:56:33.263000 55824 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8015992Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml 2025-12-04T10:11:57.8016089Z ============================= test session starts ============================== 2025-12-04T10:11:57.8016299Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8016365Z cachedir: .pytest_cache 2025-12-04T10:11:57.8016675Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8016754Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8016818Z configfile: pytest.ini 2025-12-04T10:11:57.8017345Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8017486Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8018067Z stepcurrent: skipping 38 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8018144Z Running 1 items in this shard 2025-12-04T10:11:57.8018148Z 2025-12-04T10:11:57.8018882Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:34.294440433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8018958Z 2025-12-04T10:11:57.8019257Z [W1204 09:56:43.626429096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8019261Z 2025-12-04T10:11:57.8019552Z [W1204 09:56:43.626685660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8019555Z 2025-12-04T10:11:57.8019847Z [W1204 09:56:43.632544417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8019851Z 2025-12-04T10:11:57.8020136Z [W1204 09:56:43.633130987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8020139Z 2025-12-04T10:11:57.8020427Z [W1204 09:56:43.633298570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8020434Z 2025-12-04T10:11:57.8020717Z [W1204 09:56:43.638681728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8020721Z 2025-12-04T10:11:57.8021011Z [W1204 09:56:43.639211068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8021015Z 2025-12-04T10:11:57.8021298Z [W1204 09:56:43.639374481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8021302Z 2025-12-04T10:11:57.8021384Z ('RERUN', {'yellow': True}) [11.1662s] [100%] 2025-12-04T10:11:57.8022161Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:44.805550478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8022209Z 2025-12-04T10:11:57.8022500Z [W1204 09:56:44.806129398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8022503Z 2025-12-04T10:11:57.8022794Z [W1204 09:56:44.806278641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8022797Z 2025-12-04T10:11:57.8023079Z [W1204 09:56:44.809226545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8023082Z 2025-12-04T10:11:57.8023376Z [W1204 09:56:44.809791685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8023379Z 2025-12-04T10:11:57.8023668Z [W1204 09:56:44.809933568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8023673Z 2025-12-04T10:11:57.8023963Z [W1204 09:56:44.814477549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8023966Z 2025-12-04T10:11:57.8024286Z [W1204 09:56:44.814940148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8024290Z 2025-12-04T10:11:57.8024581Z [W1204 09:56:44.815077890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8024584Z 2025-12-04T10:11:57.8024663Z ('RERUN', {'yellow': True}) [0.4097s] [100%] 2025-12-04T10:11:57.8025378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:45.213234814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8025386Z 2025-12-04T10:11:57.8025706Z [W1204 09:56:45.213794894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8025709Z 2025-12-04T10:11:57.8025998Z [W1204 09:56:45.213932307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8026001Z 2025-12-04T10:11:57.8026290Z [W1204 09:56:45.216755347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8026293Z 2025-12-04T10:11:57.8026580Z [W1204 09:56:45.217294347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8026583Z 2025-12-04T10:11:57.8026875Z [W1204 09:56:45.217431960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8026879Z 2025-12-04T10:11:57.8027164Z [W1204 09:56:45.221827179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8027168Z 2025-12-04T10:11:57.8027465Z [W1204 09:56:45.222280838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8027468Z 2025-12-04T10:11:57.8027753Z [W1204 09:56:45.222417630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8027757Z 2025-12-04T10:11:57.8027818Z FAILED [0.4054s] [100%] 2025-12-04T10:11:57.8027826Z 2025-12-04T10:11:57.8027942Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8028243Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8028320Z Traceback (most recent call last): 2025-12-04T10:11:57.8028670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8028740Z method(*args, **kwargs) 2025-12-04T10:11:57.8029041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8029103Z method(*args, **kwargs) 2025-12-04T10:11:57.8029395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8029452Z with policy(): 2025-12-04T10:11:57.8029747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8029816Z raise RuntimeError(msg) 2025-12-04T10:11:57.8030605Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8030612Z 2025-12-04T10:11:57.8030743Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8031292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8031297Z 2025-12-04T10:11:57.8031455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8031585Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8031679Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8032030Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8032157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8032249Z graph_break [] 2025-12-04T10:11:57.8032379Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8033075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8033149Z if out == self.unknown_value: 2025-12-04T10:11:57.8033437Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8033511Z Traceback (most recent call last): 2025-12-04T10:11:57.8033809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8033871Z method(*args, **kwargs) 2025-12-04T10:11:57.8034164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8034232Z method(*args, **kwargs) 2025-12-04T10:11:57.8034521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8034593Z with policy(): 2025-12-04T10:11:57.8034890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8034955Z raise RuntimeError(msg) 2025-12-04T10:11:57.8035795Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8035799Z 2025-12-04T10:11:57.8035960Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8036482Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8036486Z 2025-12-04T10:11:57.8036643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8036769Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8036862Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8037207Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8037335Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8037394Z graph_break [] 2025-12-04T10:11:57.8037519Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8038257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8038327Z if out == self.unknown_value: 2025-12-04T10:11:57.8038453Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8038542Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8038662Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8039005Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8039063Z graph_break [] 2025-12-04T10:11:57.8039149Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8039474Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8039546Z Traceback (most recent call last): 2025-12-04T10:11:57.8039845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8039950Z method(*args, **kwargs) 2025-12-04T10:11:57.8040242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8040311Z method(*args, **kwargs) 2025-12-04T10:11:57.8040602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8040663Z with policy(): 2025-12-04T10:11:57.8040955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8041023Z raise RuntimeError(msg) 2025-12-04T10:11:57.8041835Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8041839Z 2025-12-04T10:11:57.8041962Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8042666Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8042674Z 2025-12-04T10:11:57.8042871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8043007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8043150Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8043499Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8043644Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8043705Z graph_break [] 2025-12-04T10:11:57.8043833Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8044549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8044619Z if out == self.unknown_value: 2025-12-04T10:11:57.8044744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8044838Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8044963Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8045367Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8045430Z graph_break [] 2025-12-04T10:11:57.8045553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8045644Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8045767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8046118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8046180Z graph_break [] 2025-12-04T10:11:57.8046675Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml - 2025-12-04T10:11:57.8050866Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8052183Z FAILED [0.4054s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8052192Z 2025-12-04T10:11:57.8052330Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8052855Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8052863Z 2025-12-04T10:11:57.8053036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8053151Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8053275Z ================== 1 failed, 57 deselected, 2 rerun in 12.00s ================== 2025-12-04T10:11:57.8053335Z Got exit code 1 2025-12-04T10:11:57.8053401Z Retrying single test... 2025-12-04T10:11:57.8053672Z W1204 09:56:51.892000 56010 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8054104Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml 2025-12-04T10:11:57.8054207Z ============================= test session starts ============================== 2025-12-04T10:11:57.8054420Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8054526Z cachedir: .pytest_cache 2025-12-04T10:11:57.8054836Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8054916Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8054982Z configfile: pytest.ini 2025-12-04T10:11:57.8055305Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8055438Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8056013Z stepcurrent: skipping 38 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8056098Z Running 1 items in this shard 2025-12-04T10:11:57.8056102Z 2025-12-04T10:11:57.8056874Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:52.939973852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8056881Z 2025-12-04T10:11:57.8057186Z [W1204 09:57:02.114876196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8057189Z 2025-12-04T10:11:57.8057478Z [W1204 09:57:02.115112780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8057482Z 2025-12-04T10:11:57.8057794Z [W1204 09:57:02.120577807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8057797Z 2025-12-04T10:11:57.8058096Z [W1204 09:57:02.121118527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8058135Z 2025-12-04T10:11:57.8058430Z [W1204 09:57:02.121299610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8058433Z 2025-12-04T10:11:57.8058721Z [W1204 09:57:02.126642794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8058724Z 2025-12-04T10:11:57.8059014Z [W1204 09:57:02.127147263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8059018Z 2025-12-04T10:11:57.8059304Z [W1204 09:57:02.127307176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8059307Z 2025-12-04T10:11:57.8059391Z ('RERUN', {'yellow': True}) [11.0215s] [100%] 2025-12-04T10:11:57.8060129Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:57:03.292542369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8060134Z 2025-12-04T10:11:57.8060424Z [W1204 09:57:03.293125680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8060427Z 2025-12-04T10:11:57.8060717Z [W1204 09:57:03.293274052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8060721Z 2025-12-04T10:11:57.8061043Z [W1204 09:57:03.296197904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8061047Z 2025-12-04T10:11:57.8061338Z [W1204 09:57:03.296777414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8061376Z 2025-12-04T10:11:57.8061662Z [W1204 09:57:03.296917897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8061666Z 2025-12-04T10:11:57.8061954Z [W1204 09:57:03.301421836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8061959Z 2025-12-04T10:11:57.8062244Z [W1204 09:57:03.301890865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8062247Z 2025-12-04T10:11:57.8062538Z [W1204 09:57:03.302034497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8062541Z 2025-12-04T10:11:57.8062621Z ('RERUN', {'yellow': True}) [0.4095s] [100%] 2025-12-04T10:11:57.8063340Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:57:03.699437681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8063380Z 2025-12-04T10:11:57.8063672Z [W1204 09:57:03.700021791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8063675Z 2025-12-04T10:11:57.8063961Z [W1204 09:57:03.700176804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8063965Z 2025-12-04T10:11:57.8064254Z [W1204 09:57:03.703069325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8064257Z 2025-12-04T10:11:57.8064546Z [W1204 09:57:03.703627715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8064584Z 2025-12-04T10:11:57.8064881Z [W1204 09:57:03.703767297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8064886Z 2025-12-04T10:11:57.8065175Z [W1204 09:57:03.708216256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8065178Z 2025-12-04T10:11:57.8065467Z [W1204 09:57:03.708683494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8065470Z 2025-12-04T10:11:57.8065756Z [W1204 09:57:03.708824177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8065759Z 2025-12-04T10:11:57.8065822Z FAILED [0.4080s] [100%] 2025-12-04T10:11:57.8065825Z 2025-12-04T10:11:57.8065914Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8066210Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8066290Z Traceback (most recent call last): 2025-12-04T10:11:57.8066605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8066670Z method(*args, **kwargs) 2025-12-04T10:11:57.8066969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8067033Z method(*args, **kwargs) 2025-12-04T10:11:57.8067360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8067422Z with policy(): 2025-12-04T10:11:57.8067718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8067823Z raise RuntimeError(msg) 2025-12-04T10:11:57.8068622Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8068627Z 2025-12-04T10:11:57.8068764Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8069281Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8069285Z 2025-12-04T10:11:57.8069445Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8069578Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8069677Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8070032Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8070270Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8070330Z graph_break [] 2025-12-04T10:11:57.8070459Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8071161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8071235Z if out == self.unknown_value: 2025-12-04T10:11:57.8071525Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8071635Z Traceback (most recent call last): 2025-12-04T10:11:57.8071942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8072006Z method(*args, **kwargs) 2025-12-04T10:11:57.8072297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8072363Z method(*args, **kwargs) 2025-12-04T10:11:57.8072651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8072711Z with policy(): 2025-12-04T10:11:57.8073006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8073071Z raise RuntimeError(msg) 2025-12-04T10:11:57.8073879Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8073887Z 2025-12-04T10:11:57.8074017Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8074534Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8074538Z 2025-12-04T10:11:57.8074694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8074856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8074954Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8075303Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8075492Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8075551Z graph_break [] 2025-12-04T10:11:57.8075674Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8076368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8076436Z if out == self.unknown_value: 2025-12-04T10:11:57.8076570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8076661Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8076783Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8077135Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8077195Z graph_break [] 2025-12-04T10:11:57.8077313Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8077605Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8077678Z Traceback (most recent call last): 2025-12-04T10:11:57.8077981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8078043Z method(*args, **kwargs) 2025-12-04T10:11:57.8078334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8078398Z method(*args, **kwargs) 2025-12-04T10:11:57.8078695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8078795Z with policy(): 2025-12-04T10:11:57.8079090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8079155Z raise RuntimeError(msg) 2025-12-04T10:11:57.8080062Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8080067Z 2025-12-04T10:11:57.8080197Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8080717Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8080724Z 2025-12-04T10:11:57.8080887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8081014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8081112Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8081456Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8081584Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8081644Z graph_break [] 2025-12-04T10:11:57.8081816Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8082512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8082616Z if out == self.unknown_value: 2025-12-04T10:11:57.8082740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8082830Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8082950Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8083295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8083353Z graph_break [] 2025-12-04T10:11:57.8083475Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8083561Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8083682Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8084022Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8084084Z graph_break [] 2025-12-04T10:11:57.8084604Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml - 2025-12-04T10:11:57.8084710Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8085986Z FAILED [0.4080s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8086025Z 2025-12-04T10:11:57.8086152Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8086664Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8086667Z 2025-12-04T10:11:57.8086824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8086926Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8087041Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ================== 2025-12-04T10:11:57.8087103Z Got exit code 1 2025-12-04T10:11:57.8087569Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8087817Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8088086Z W1204 09:57:10.291000 56196 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8088479Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml 2025-12-04T10:11:57.8088587Z ============================= test session starts ============================== 2025-12-04T10:11:57.8088794Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8088860Z cachedir: .pytest_cache 2025-12-04T10:11:57.8089204Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8089282Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8089349Z configfile: pytest.ini 2025-12-04T10:11:57.8089699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8089827Z collecting ... collected 58 items / 39 deselected / 19 selected 2025-12-04T10:11:57.8089920Z stepcurrent: skipping 39 already run items. 2025-12-04T10:11:57.8089990Z Running 19 items in this shard 2025-12-04T10:11:57.8089994Z 2025-12-04T10:11:57.8090486Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9148s] [ 5%] 2025-12-04T10:11:57.8090972Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5328s] [ 5%] 2025-12-04T10:11:57.8091405Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5290s] [ 5%] 2025-12-04T10:11:57.8091413Z 2025-12-04T10:11:57.8091494Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8091817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8091895Z Traceback (most recent call last): 2025-12-04T10:11:57.8092204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8092269Z method(*args, **kwargs) 2025-12-04T10:11:57.8092564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8092624Z method(*args, **kwargs) 2025-12-04T10:11:57.8092920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8092981Z with policy(): 2025-12-04T10:11:57.8093322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8093389Z raise RuntimeError(msg) 2025-12-04T10:11:57.8094187Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8094191Z 2025-12-04T10:11:57.8094322Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8094835Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8094838Z 2025-12-04T10:11:57.8094993Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8095124Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8095216Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8095767Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8095896Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8095959Z graph_break [] 2025-12-04T10:11:57.8096284Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8096357Z Traceback (most recent call last): 2025-12-04T10:11:57.8096653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8096766Z method(*args, **kwargs) 2025-12-04T10:11:57.8097061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8097128Z method(*args, **kwargs) 2025-12-04T10:11:57.8097419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8097476Z with policy(): 2025-12-04T10:11:57.8097770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8097835Z raise RuntimeError(msg) 2025-12-04T10:11:57.8098644Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8098650Z 2025-12-04T10:11:57.8098774Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8099318Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8099327Z 2025-12-04T10:11:57.8099481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8099605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8099700Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8100244Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8100373Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8100477Z graph_break [] 2025-12-04T10:11:57.8100602Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8100693Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8100813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8101350Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8101412Z graph_break [] 2025-12-04T10:11:57.8101495Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8101781Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8101851Z Traceback (most recent call last): 2025-12-04T10:11:57.8102148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8102215Z method(*args, **kwargs) 2025-12-04T10:11:57.8102514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8102576Z method(*args, **kwargs) 2025-12-04T10:11:57.8102873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8102931Z with policy(): 2025-12-04T10:11:57.8103280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8103347Z raise RuntimeError(msg) 2025-12-04T10:11:57.8104147Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8104190Z 2025-12-04T10:11:57.8104315Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8104825Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8104830Z 2025-12-04T10:11:57.8104991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8105120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8105217Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8105761Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8105884Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8105945Z graph_break [] 2025-12-04T10:11:57.8106100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8106188Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8106312Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8106847Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8106911Z graph_break [] 2025-12-04T10:11:57.8107030Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8107116Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8107285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8107818Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8107878Z graph_break [] 2025-12-04T10:11:57.8108366Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml - 2025-12-04T10:11:57.8108465Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8109742Z FAILED [0.5290s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8109750Z 2025-12-04T10:11:57.8109873Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8110386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8110390Z 2025-12-04T10:11:57.8110575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8110684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8110798Z ================== 1 failed, 39 deselected, 2 rerun in 3.00s =================== 2025-12-04T10:11:57.8110889Z Got exit code 1 2025-12-04T10:11:57.8110960Z Retrying single test... 2025-12-04T10:11:57.8111221Z W1204 09:57:19.956000 56378 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8111605Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml 2025-12-04T10:11:57.8111699Z ============================= test session starts ============================== 2025-12-04T10:11:57.8111904Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8111971Z cachedir: .pytest_cache 2025-12-04T10:11:57.8112278Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8112352Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8112419Z configfile: pytest.ini 2025-12-04T10:11:57.8112734Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8112869Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8113493Z stepcurrent: skipping 39 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8113568Z Running 1 items in this shard 2025-12-04T10:11:57.8113572Z 2025-12-04T10:11:57.8114311Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:21.538428472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8114315Z 2025-12-04T10:11:57.8114615Z [W1204 09:57:30.564998080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8114655Z 2025-12-04T10:11:57.8114948Z [W1204 09:57:30.565264425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8114952Z 2025-12-04T10:11:57.8115247Z [W1204 09:57:30.571916840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8115250Z 2025-12-04T10:11:57.8115544Z [W1204 09:57:30.572533652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8115548Z 2025-12-04T10:11:57.8115838Z [W1204 09:57:30.572705895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8115841Z 2025-12-04T10:11:57.8116130Z [W1204 09:57:30.578163948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8116136Z 2025-12-04T10:11:57.8116425Z [W1204 09:57:30.578759879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8116428Z 2025-12-04T10:11:57.8116716Z [W1204 09:57:30.578929842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8116722Z 2025-12-04T10:11:57.8116803Z ('RERUN', {'yellow': True}) [10.9693s] [100%] 2025-12-04T10:11:57.8117793Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:31.383582973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8117799Z 2025-12-04T10:11:57.8118101Z [W1204 09:57:31.384130943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8118156Z 2025-12-04T10:11:57.8118457Z [W1204 09:57:31.384271186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8118460Z 2025-12-04T10:11:57.8118752Z [W1204 09:57:31.387168150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8118755Z 2025-12-04T10:11:57.8119040Z [W1204 09:57:31.387617809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8119043Z 2025-12-04T10:11:57.8119334Z [W1204 09:57:31.387755432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8119337Z 2025-12-04T10:11:57.8119622Z [W1204 09:57:31.392373008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8119627Z 2025-12-04T10:11:57.8119968Z [W1204 09:57:31.392830137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8119972Z 2025-12-04T10:11:57.8120314Z [W1204 09:57:31.392967319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8120318Z 2025-12-04T10:11:57.8120398Z ('RERUN', {'yellow': True}) [0.5013s] [100%] 2025-12-04T10:11:57.8121136Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:31.883574099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8121140Z 2025-12-04T10:11:57.8121431Z [W1204 09:57:31.884114039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8121436Z 2025-12-04T10:11:57.8121776Z [W1204 09:57:31.884255131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8121779Z 2025-12-04T10:11:57.8122067Z [W1204 09:57:31.887165731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8122071Z 2025-12-04T10:11:57.8122363Z [W1204 09:57:31.887611449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8122367Z 2025-12-04T10:11:57.8122655Z [W1204 09:57:31.887747541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8122659Z 2025-12-04T10:11:57.8122950Z [W1204 09:57:31.892332781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8122954Z 2025-12-04T10:11:57.8123239Z [W1204 09:57:31.892791999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8123244Z 2025-12-04T10:11:57.8123532Z [W1204 09:57:31.892928701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8123537Z 2025-12-04T10:11:57.8123598Z FAILED [0.4986s] [100%] 2025-12-04T10:11:57.8123601Z 2025-12-04T10:11:57.8123681Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8123970Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8124080Z Traceback (most recent call last): 2025-12-04T10:11:57.8124395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8124463Z method(*args, **kwargs) 2025-12-04T10:11:57.8124790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8124857Z method(*args, **kwargs) 2025-12-04T10:11:57.8125147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8125204Z with policy(): 2025-12-04T10:11:57.8125504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8125571Z raise RuntimeError(msg) 2025-12-04T10:11:57.8126371Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8126376Z 2025-12-04T10:11:57.8126509Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8127062Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8127071Z 2025-12-04T10:11:57.8127229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8127355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8127452Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8128007Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8128135Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8128198Z graph_break [] 2025-12-04T10:11:57.8128396Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8129095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8129166Z if out == self.unknown_value: 2025-12-04T10:11:57.8129451Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8129525Z Traceback (most recent call last): 2025-12-04T10:11:57.8129824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8129891Z method(*args, **kwargs) 2025-12-04T10:11:57.8130180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8130244Z method(*args, **kwargs) 2025-12-04T10:11:57.8130545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8130604Z with policy(): 2025-12-04T10:11:57.8130900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8130969Z raise RuntimeError(msg) 2025-12-04T10:11:57.8131815Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8131819Z 2025-12-04T10:11:57.8131950Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8132499Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8132505Z 2025-12-04T10:11:57.8132665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8132788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8132879Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8133425Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8133554Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8133615Z graph_break [] 2025-12-04T10:11:57.8133735Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8134463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8134537Z if out == self.unknown_value: 2025-12-04T10:11:57.8134662Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8134750Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8134877Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8135416Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8135481Z graph_break [] 2025-12-04T10:11:57.8135561Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8135896Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8135977Z Traceback (most recent call last): 2025-12-04T10:11:57.8136277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8136342Z method(*args, **kwargs) 2025-12-04T10:11:57.8136634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8136699Z method(*args, **kwargs) 2025-12-04T10:11:57.8136994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8137053Z with policy(): 2025-12-04T10:11:57.8137349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8137419Z raise RuntimeError(msg) 2025-12-04T10:11:57.8138236Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8138240Z 2025-12-04T10:11:57.8138372Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8138923Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8138927Z 2025-12-04T10:11:57.8139087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8139209Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8139335Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8139880Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8140004Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8140067Z graph_break [] 2025-12-04T10:11:57.8140187Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8140879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8140947Z if out == self.unknown_value: 2025-12-04T10:11:57.8141069Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8141164Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8141285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8141873Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8141936Z graph_break [] 2025-12-04T10:11:57.8142061Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8142156Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8142274Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8142812Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8142909Z graph_break [] 2025-12-04T10:11:57.8143398Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml - 2025-12-04T10:11:57.8143499Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8144774Z FAILED [0.4986s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8144781Z 2025-12-04T10:11:57.8144908Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8145425Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8145429Z 2025-12-04T10:11:57.8145587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8145688Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8145803Z ================== 1 failed, 57 deselected, 2 rerun in 11.99s ================== 2025-12-04T10:11:57.8145863Z Got exit code 1 2025-12-04T10:11:57.8145962Z Retrying single test... 2025-12-04T10:11:57.8146222Z W1204 09:57:38.558000 56565 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8146606Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml 2025-12-04T10:11:57.8146735Z ============================= test session starts ============================== 2025-12-04T10:11:57.8146945Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8147012Z cachedir: .pytest_cache 2025-12-04T10:11:57.8147316Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8147395Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8147459Z configfile: pytest.ini 2025-12-04T10:11:57.8147773Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8147909Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8148475Z stepcurrent: skipping 39 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8148551Z Running 1 items in this shard 2025-12-04T10:11:57.8148554Z 2025-12-04T10:11:57.8149315Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:40.155449721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8149320Z 2025-12-04T10:11:57.8149624Z [W1204 09:57:49.240479018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8149628Z 2025-12-04T10:11:57.8149922Z [W1204 09:57:49.240729032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8149925Z 2025-12-04T10:11:57.8150217Z [W1204 09:57:49.246537042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8150254Z 2025-12-04T10:11:57.8150547Z [W1204 09:57:49.247122893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8150550Z 2025-12-04T10:11:57.8150837Z [W1204 09:57:49.247296356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8150840Z 2025-12-04T10:11:57.8151130Z [W1204 09:57:49.252693299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8151135Z 2025-12-04T10:11:57.8151424Z [W1204 09:57:49.253243059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8151427Z 2025-12-04T10:11:57.8151716Z [W1204 09:57:49.253412951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8151722Z 2025-12-04T10:11:57.8151803Z ('RERUN', {'yellow': True}) [11.0359s] [100%] 2025-12-04T10:11:57.8152524Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:50.052927236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8152527Z 2025-12-04T10:11:57.8152850Z [W1204 09:57:50.053481766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8152854Z 2025-12-04T10:11:57.8153144Z [W1204 09:57:50.053621988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8153147Z 2025-12-04T10:11:57.8153436Z [W1204 09:57:50.056524768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8153473Z 2025-12-04T10:11:57.8153761Z [W1204 09:57:50.056975636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8153768Z 2025-12-04T10:11:57.8154053Z [W1204 09:57:50.057114908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8154056Z 2025-12-04T10:11:57.8154341Z [W1204 09:57:50.061687357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8154346Z 2025-12-04T10:11:57.8154635Z [W1204 09:57:50.062148005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8154640Z 2025-12-04T10:11:57.8154927Z [W1204 09:57:50.062284488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8154933Z 2025-12-04T10:11:57.8155015Z ('RERUN', {'yellow': True}) [0.4964s] [100%] 2025-12-04T10:11:57.8155790Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:50.548003983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8155795Z 2025-12-04T10:11:57.8156087Z [W1204 09:57:50.548569503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8156092Z 2025-12-04T10:11:57.8156381Z [W1204 09:57:50.548709605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8156385Z 2025-12-04T10:11:57.8156679Z [W1204 09:57:50.551619595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8156719Z 2025-12-04T10:11:57.8157012Z [W1204 09:57:50.552072053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8157015Z 2025-12-04T10:11:57.8157301Z [W1204 09:57:50.552211806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8157304Z 2025-12-04T10:11:57.8157590Z [W1204 09:57:50.556690953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8157593Z 2025-12-04T10:11:57.8157879Z [W1204 09:57:50.557141651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8157882Z 2025-12-04T10:11:57.8158173Z [W1204 09:57:50.557281523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8158179Z 2025-12-04T10:11:57.8158240Z FAILED [0.4943s] [100%] 2025-12-04T10:11:57.8158243Z 2025-12-04T10:11:57.8158330Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8158618Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8158689Z Traceback (most recent call last): 2025-12-04T10:11:57.8158999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8159062Z method(*args, **kwargs) 2025-12-04T10:11:57.8159398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8159460Z method(*args, **kwargs) 2025-12-04T10:11:57.8159751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8159846Z with policy(): 2025-12-04T10:11:57.8160197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8160265Z raise RuntimeError(msg) 2025-12-04T10:11:57.8161066Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8161071Z 2025-12-04T10:11:57.8161200Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8161718Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8161725Z 2025-12-04T10:11:57.8161881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8162009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8162140Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8162687Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8162815Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8162873Z graph_break [] 2025-12-04T10:11:57.8162994Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8163685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8163793Z if out == self.unknown_value: 2025-12-04T10:11:57.8164086Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8164158Z Traceback (most recent call last): 2025-12-04T10:11:57.8164455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8164523Z method(*args, **kwargs) 2025-12-04T10:11:57.8164815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8164879Z method(*args, **kwargs) 2025-12-04T10:11:57.8165166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8165226Z with policy(): 2025-12-04T10:11:57.8165523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8165588Z raise RuntimeError(msg) 2025-12-04T10:11:57.8166399Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8166403Z 2025-12-04T10:11:57.8166527Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8167073Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8167081Z 2025-12-04T10:11:57.8167236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8167396Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8167489Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8168034Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8168157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8168217Z graph_break [] 2025-12-04T10:11:57.8168339Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8169029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8169098Z if out == self.unknown_value: 2025-12-04T10:11:57.8169218Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8169309Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8169464Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8170004Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8170061Z graph_break [] 2025-12-04T10:11:57.8170143Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8170431Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8170505Z Traceback (most recent call last): 2025-12-04T10:11:57.8170800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8170898Z method(*args, **kwargs) 2025-12-04T10:11:57.8171192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8171258Z method(*args, **kwargs) 2025-12-04T10:11:57.8171549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8171610Z with policy(): 2025-12-04T10:11:57.8171907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8171972Z raise RuntimeError(msg) 2025-12-04T10:11:57.8172777Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8172783Z 2025-12-04T10:11:57.8172908Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8173418Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8173425Z 2025-12-04T10:11:57.8173578Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8173734Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8173827Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8174366Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8174524Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8174584Z graph_break [] 2025-12-04T10:11:57.8174706Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8175395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8175462Z if out == self.unknown_value: 2025-12-04T10:11:57.8175582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8175674Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8175792Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8176329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8176435Z graph_break [] 2025-12-04T10:11:57.8176559Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8176649Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8176766Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8177306Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8177363Z graph_break [] 2025-12-04T10:11:57.8177846Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml - 2025-12-04T10:11:57.8177985Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8179257Z FAILED [0.4943s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8179264Z 2025-12-04T10:11:57.8179387Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8179900Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8179906Z 2025-12-04T10:11:57.8180067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8180169Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8180283Z ================== 1 failed, 57 deselected, 2 rerun in 12.05s ================== 2025-12-04T10:11:57.8180342Z Got exit code 1 2025-12-04T10:11:57.8180809Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8181088Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8181350Z W1204 09:57:57.141000 56752 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8181732Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml 2025-12-04T10:11:57.8181954Z ============================= test session starts ============================== 2025-12-04T10:11:57.8182160Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8182225Z cachedir: .pytest_cache 2025-12-04T10:11:57.8182530Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8182614Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8182682Z configfile: pytest.ini 2025-12-04T10:11:57.8182997Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8183123Z collecting ... collected 58 items / 40 deselected / 18 selected 2025-12-04T10:11:57.8183213Z stepcurrent: skipping 40 already run items. 2025-12-04T10:11:57.8183282Z Running 18 items in this shard 2025-12-04T10:11:57.8183288Z 2025-12-04T10:11:57.8183831Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0367s] [ 5%] 2025-12-04T10:11:57.8184325Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6604s] [ 5%] 2025-12-04T10:11:57.8184778Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6525s] [ 5%] 2025-12-04T10:11:57.8184785Z 2025-12-04T10:11:57.8184867Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8185164Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8185275Z Traceback (most recent call last): 2025-12-04T10:11:57.8185580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8185644Z method(*args, **kwargs) 2025-12-04T10:11:57.8185941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8186003Z method(*args, **kwargs) 2025-12-04T10:11:57.8186295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8186352Z with policy(): 2025-12-04T10:11:57.8186646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8186712Z raise RuntimeError(msg) 2025-12-04T10:11:57.8187527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8187534Z 2025-12-04T10:11:57.8187660Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8188187Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8188191Z 2025-12-04T10:11:57.8188381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8188511Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8188604Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8188956Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8189117Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8189176Z graph_break [] 2025-12-04T10:11:57.8189472Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8189543Z Traceback (most recent call last): 2025-12-04T10:11:57.8189842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8189906Z method(*args, **kwargs) 2025-12-04T10:11:57.8190195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8190258Z method(*args, **kwargs) 2025-12-04T10:11:57.8190543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8190603Z with policy(): 2025-12-04T10:11:57.8190897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8190995Z raise RuntimeError(msg) 2025-12-04T10:11:57.8191828Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8191832Z 2025-12-04T10:11:57.8191969Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8192496Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8192541Z 2025-12-04T10:11:57.8192696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8192820Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8192914Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8193258Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8193382Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8193442Z graph_break [] 2025-12-04T10:11:57.8193564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8193651Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8193770Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8194106Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8194166Z graph_break [] 2025-12-04T10:11:57.8194248Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8194540Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8194613Z Traceback (most recent call last): 2025-12-04T10:11:57.8194912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8194976Z method(*args, **kwargs) 2025-12-04T10:11:57.8195315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8195379Z method(*args, **kwargs) 2025-12-04T10:11:57.8195671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8195768Z with policy(): 2025-12-04T10:11:57.8196067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8196131Z raise RuntimeError(msg) 2025-12-04T10:11:57.8196952Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8196956Z 2025-12-04T10:11:57.8197082Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8197604Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8197612Z 2025-12-04T10:11:57.8197766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8197920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8198009Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8198358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8198479Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8198538Z graph_break [] 2025-12-04T10:11:57.8198659Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8198745Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8198865Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8199201Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8199293Z graph_break [] 2025-12-04T10:11:57.8199421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8199507Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8199628Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8200002Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8200062Z graph_break [] 2025-12-04T10:11:57.8200555Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml - 2025-12-04T10:11:57.8200653Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8201975Z FAILED [0.6525s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8201979Z 2025-12-04T10:11:57.8202100Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8202676Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8202680Z 2025-12-04T10:11:57.8202869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8202977Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8203095Z ================== 1 failed, 40 deselected, 2 rerun in 3.37s =================== 2025-12-04T10:11:57.8203154Z Got exit code 1 2025-12-04T10:11:57.8203220Z Retrying single test... 2025-12-04T10:11:57.8203483Z W1204 09:58:07.056000 56941 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8203869Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml 2025-12-04T10:11:57.8203967Z ============================= test session starts ============================== 2025-12-04T10:11:57.8204171Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8204234Z cachedir: .pytest_cache 2025-12-04T10:11:57.8204540Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8204614Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8204681Z configfile: pytest.ini 2025-12-04T10:11:57.8205026Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8205154Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8205732Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8205801Z Running 1 items in this shard 2025-12-04T10:11:57.8205805Z 2025-12-04T10:11:57.8206550Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:08.276757045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8206589Z 2025-12-04T10:11:57.8206890Z [W1204 09:58:17.285777770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8206894Z 2025-12-04T10:11:57.8207189Z [W1204 09:58:17.286034515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8207193Z 2025-12-04T10:11:57.8207487Z [W1204 09:58:17.291679423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8207490Z 2025-12-04T10:11:57.8207785Z [W1204 09:58:17.292246553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8207791Z 2025-12-04T10:11:57.8208079Z [W1204 09:58:17.292422866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8208084Z 2025-12-04T10:11:57.8208373Z [W1204 09:58:17.297819410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8208376Z 2025-12-04T10:11:57.8208668Z [W1204 09:58:17.298361499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8208672Z 2025-12-04T10:11:57.8209015Z [W1204 09:58:17.298536092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8209019Z 2025-12-04T10:11:57.8209102Z ('RERUN', {'yellow': True}) [11.0330s] [100%] 2025-12-04T10:11:57.8209838Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:18.636102449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8209876Z 2025-12-04T10:11:57.8210170Z [W1204 09:58:18.636636128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8210173Z 2025-12-04T10:11:57.8210459Z [W1204 09:58:18.636775141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8210463Z 2025-12-04T10:11:57.8210755Z [W1204 09:58:18.639728201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8210758Z 2025-12-04T10:11:57.8211045Z [W1204 09:58:18.640333072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8211048Z 2025-12-04T10:11:57.8211333Z [W1204 09:58:18.640481334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8211342Z 2025-12-04T10:11:57.8211672Z [W1204 09:58:18.645096515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8211677Z 2025-12-04T10:11:57.8211966Z [W1204 09:58:18.645564813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8211970Z 2025-12-04T10:11:57.8212260Z [W1204 09:58:18.645702935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8212265Z 2025-12-04T10:11:57.8212343Z ('RERUN', {'yellow': True}) [0.5812s] [100%] 2025-12-04T10:11:57.8213076Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:19.214436229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8213115Z 2025-12-04T10:11:57.8213402Z [W1204 09:58:19.214954518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8213406Z 2025-12-04T10:11:57.8213697Z [W1204 09:58:19.215093271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8213700Z 2025-12-04T10:11:57.8213985Z [W1204 09:58:19.217994290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8213990Z 2025-12-04T10:11:57.8214274Z [W1204 09:58:19.218541190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8214281Z 2025-12-04T10:11:57.8214566Z [W1204 09:58:19.218679562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8214573Z 2025-12-04T10:11:57.8214859Z [W1204 09:58:19.223174950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8214862Z 2025-12-04T10:11:57.8215148Z [W1204 09:58:19.223631518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8215151Z 2025-12-04T10:11:57.8215440Z [W1204 09:58:19.223768081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8215444Z 2025-12-04T10:11:57.8215540Z FAILED [0.5804s] [100%] 2025-12-04T10:11:57.8215544Z 2025-12-04T10:11:57.8215626Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8215922Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8216042Z Traceback (most recent call last): 2025-12-04T10:11:57.8216350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8216416Z method(*args, **kwargs) 2025-12-04T10:11:57.8216710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8216773Z method(*args, **kwargs) 2025-12-04T10:11:57.8217228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8217290Z with policy(): 2025-12-04T10:11:57.8217586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8217660Z raise RuntimeError(msg) 2025-12-04T10:11:57.8218535Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8218543Z 2025-12-04T10:11:57.8218675Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8219199Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8219203Z 2025-12-04T10:11:57.8219366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8219489Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8219582Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8219932Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8220105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8220169Z graph_break [] 2025-12-04T10:11:57.8220293Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8220988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8221061Z if out == self.unknown_value: 2025-12-04T10:11:57.8221353Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8221427Z Traceback (most recent call last): 2025-12-04T10:11:57.8221728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8221792Z method(*args, **kwargs) 2025-12-04T10:11:57.8222092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8222153Z method(*args, **kwargs) 2025-12-04T10:11:57.8222442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8222503Z with policy(): 2025-12-04T10:11:57.8222798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8222914Z raise RuntimeError(msg) 2025-12-04T10:11:57.8223743Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8223793Z 2025-12-04T10:11:57.8223920Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8224449Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8224453Z 2025-12-04T10:11:57.8224608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8224736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8224827Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8225175Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8225302Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8225359Z graph_break [] 2025-12-04T10:11:57.8225485Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8226240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8226311Z if out == self.unknown_value: 2025-12-04T10:11:57.8226434Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8226524Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8226647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8226990Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8227084Z graph_break [] 2025-12-04T10:11:57.8227174Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8227471Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8227544Z Traceback (most recent call last): 2025-12-04T10:11:57.8227854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8227917Z method(*args, **kwargs) 2025-12-04T10:11:57.8228215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8228283Z method(*args, **kwargs) 2025-12-04T10:11:57.8228571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8228633Z with policy(): 2025-12-04T10:11:57.8228927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8228994Z raise RuntimeError(msg) 2025-12-04T10:11:57.8229821Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8229825Z 2025-12-04T10:11:57.8229948Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8230517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8230556Z 2025-12-04T10:11:57.8230712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8230838Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8230927Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8231271Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8231392Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8231449Z graph_break [] 2025-12-04T10:11:57.8231573Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8232270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8232341Z if out == self.unknown_value: 2025-12-04T10:11:57.8232468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8232556Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8232714Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8233053Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8233109Z graph_break [] 2025-12-04T10:11:57.8233230Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8233317Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8233436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8233773Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8233885Z graph_break [] 2025-12-04T10:11:57.8234372Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml - 2025-12-04T10:11:57.8234472Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8235788Z FAILED [0.5804s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8235792Z 2025-12-04T10:11:57.8235915Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8236443Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8236453Z 2025-12-04T10:11:57.8236607Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8236707Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8236824Z ================== 1 failed, 57 deselected, 2 rerun in 12.22s ================== 2025-12-04T10:11:57.8236882Z Got exit code 1 2025-12-04T10:11:57.8236987Z Retrying single test... 2025-12-04T10:11:57.8237259Z W1204 09:58:25.851000 57135 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8237641Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml 2025-12-04T10:11:57.8237779Z ============================= test session starts ============================== 2025-12-04T10:11:57.8237986Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8238049Z cachedir: .pytest_cache 2025-12-04T10:11:57.8238352Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8238425Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8238489Z configfile: pytest.ini 2025-12-04T10:11:57.8238805Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8238931Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8239507Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8239578Z Running 1 items in this shard 2025-12-04T10:11:57.8239582Z 2025-12-04T10:11:57.8240393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:27.069214635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8240397Z 2025-12-04T10:11:57.8240697Z [W1204 09:58:36.174502547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8240702Z 2025-12-04T10:11:57.8240994Z [W1204 09:58:36.174756532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8241000Z 2025-12-04T10:11:57.8241288Z [W1204 09:58:36.180653513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8241326Z 2025-12-04T10:11:57.8241615Z [W1204 09:58:36.181254664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8241618Z 2025-12-04T10:11:57.8241909Z [W1204 09:58:36.181434447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8241913Z 2025-12-04T10:11:57.8242200Z [W1204 09:58:36.186929802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8242204Z 2025-12-04T10:11:57.8242492Z [W1204 09:58:36.187485392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8242496Z 2025-12-04T10:11:57.8242782Z [W1204 09:58:36.187658325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8242789Z 2025-12-04T10:11:57.8242870Z ('RERUN', {'yellow': True}) [11.1213s] [100%] 2025-12-04T10:11:57.8243603Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:37.514024261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8243607Z 2025-12-04T10:11:57.8243903Z [W1204 09:58:37.514553700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8243949Z 2025-12-04T10:11:57.8244241Z [W1204 09:58:37.514696453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8244244Z 2025-12-04T10:11:57.8244530Z [W1204 09:58:37.517561143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8244572Z 2025-12-04T10:11:57.8244861Z [W1204 09:58:37.518111912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8244865Z 2025-12-04T10:11:57.8245151Z [W1204 09:58:37.518256755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8245155Z 2025-12-04T10:11:57.8245448Z [W1204 09:58:37.522693542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8245452Z 2025-12-04T10:11:57.8245742Z [W1204 09:58:37.523149000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8245745Z 2025-12-04T10:11:57.8246038Z [W1204 09:58:37.523284182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8246043Z 2025-12-04T10:11:57.8246120Z ('RERUN', {'yellow': True}) [0.5752s] [100%] 2025-12-04T10:11:57.8246887Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:38.087010680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8246891Z 2025-12-04T10:11:57.8247180Z [W1204 09:58:38.087539439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8247183Z 2025-12-04T10:11:57.8247469Z [W1204 09:58:38.087683152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8247475Z 2025-12-04T10:11:57.8247764Z [W1204 09:58:38.090525091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8247801Z 2025-12-04T10:11:57.8248089Z [W1204 09:58:38.091076211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8248093Z 2025-12-04T10:11:57.8248381Z [W1204 09:58:38.091215173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8248384Z 2025-12-04T10:11:57.8248668Z [W1204 09:58:38.095615479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8248671Z 2025-12-04T10:11:57.8248962Z [W1204 09:58:38.096062097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8248966Z 2025-12-04T10:11:57.8249252Z [W1204 09:58:38.096198400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8249258Z 2025-12-04T10:11:57.8249324Z FAILED [0.5677s] [100%] 2025-12-04T10:11:57.8249328Z 2025-12-04T10:11:57.8249410Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8249715Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8249795Z Traceback (most recent call last): 2025-12-04T10:11:57.8250100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8250167Z method(*args, **kwargs) 2025-12-04T10:11:57.8250499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8250563Z method(*args, **kwargs) 2025-12-04T10:11:57.8250852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8250947Z with policy(): 2025-12-04T10:11:57.8251242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8251313Z raise RuntimeError(msg) 2025-12-04T10:11:57.8252122Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8252126Z 2025-12-04T10:11:57.8252257Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8252782Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8252788Z 2025-12-04T10:11:57.8252948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8253071Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8253200Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8253550Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8253673Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8253734Z graph_break [] 2025-12-04T10:11:57.8253860Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8254553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8254660Z if out == self.unknown_value: 2025-12-04T10:11:57.8254952Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8255026Z Traceback (most recent call last): 2025-12-04T10:11:57.8255322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8255384Z method(*args, **kwargs) 2025-12-04T10:11:57.8255678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8255741Z method(*args, **kwargs) 2025-12-04T10:11:57.8256032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8256093Z with policy(): 2025-12-04T10:11:57.8256385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8256453Z raise RuntimeError(msg) 2025-12-04T10:11:57.8257285Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8257290Z 2025-12-04T10:11:57.8257414Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8258224Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8258229Z 2025-12-04T10:11:57.8258388Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8258552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8258645Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8259004Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8259131Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8259188Z graph_break [] 2025-12-04T10:11:57.8259312Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8260006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8260073Z if out == self.unknown_value: 2025-12-04T10:11:57.8260197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8260289Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8260410Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8260804Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8260864Z graph_break [] 2025-12-04T10:11:57.8260949Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8261240Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8261314Z Traceback (most recent call last): 2025-12-04T10:11:57.8261614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8261676Z method(*args, **kwargs) 2025-12-04T10:11:57.8261973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8262072Z method(*args, **kwargs) 2025-12-04T10:11:57.8262361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8262421Z with policy(): 2025-12-04T10:11:57.8262714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8262780Z raise RuntimeError(msg) 2025-12-04T10:11:57.8263611Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8263617Z 2025-12-04T10:11:57.8263737Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8264265Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8264269Z 2025-12-04T10:11:57.8264424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8264550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8264637Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8265011Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8265135Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8265202Z graph_break [] 2025-12-04T10:11:57.8265361Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8266048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8266114Z if out == self.unknown_value: 2025-12-04T10:11:57.8266236Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8266324Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8266443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8266788Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8266845Z graph_break [] 2025-12-04T10:11:57.8266966Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8267054Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8267174Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8267546Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8267603Z graph_break [] 2025-12-04T10:11:57.8268089Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml - 2025-12-04T10:11:57.8268190Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8269503Z FAILED [0.5677s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8269546Z 2025-12-04T10:11:57.8269668Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8270191Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8270194Z 2025-12-04T10:11:57.8270351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8270455Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8270573Z ================== 1 failed, 57 deselected, 2 rerun in 12.29s ================== 2025-12-04T10:11:57.8270634Z Got exit code 1 2025-12-04T10:11:57.8271116Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8271372Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8271633Z W1204 09:58:44.714000 57329 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8272019Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml 2025-12-04T10:11:57.8272150Z ============================= test session starts ============================== 2025-12-04T10:11:57.8272357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8272426Z cachedir: .pytest_cache 2025-12-04T10:11:57.8272762Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8272838Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8272907Z configfile: pytest.ini 2025-12-04T10:11:57.8273222Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8273349Z collecting ... collected 58 items / 41 deselected / 17 selected 2025-12-04T10:11:57.8273433Z stepcurrent: skipping 41 already run items. 2025-12-04T10:11:57.8273500Z Running 17 items in this shard 2025-12-04T10:11:57.8273504Z 2025-12-04T10:11:57.8274010Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8657s] [ 5%] 2025-12-04T10:11:57.8274499Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4558s] [ 5%] 2025-12-04T10:11:57.8274979Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4519s] [ 5%] 2025-12-04T10:11:57.8274983Z 2025-12-04T10:11:57.8275064Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8275362Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8275437Z Traceback (most recent call last): 2025-12-04T10:11:57.8275744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8275810Z method(*args, **kwargs) 2025-12-04T10:11:57.8276102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8276201Z method(*args, **kwargs) 2025-12-04T10:11:57.8276497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8276558Z with policy(): 2025-12-04T10:11:57.8276852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8276916Z raise RuntimeError(msg) 2025-12-04T10:11:57.8277718Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8277722Z 2025-12-04T10:11:57.8277860Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8278385Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8278390Z 2025-12-04T10:11:57.8278547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8278673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8278764Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8279148Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8279274Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8279333Z graph_break [] 2025-12-04T10:11:57.8279623Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8279729Z Traceback (most recent call last): 2025-12-04T10:11:57.8280112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8280178Z method(*args, **kwargs) 2025-12-04T10:11:57.8280470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8280534Z method(*args, **kwargs) 2025-12-04T10:11:57.8280821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8280884Z with policy(): 2025-12-04T10:11:57.8281178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8281243Z raise RuntimeError(msg) 2025-12-04T10:11:57.8282058Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8282104Z 2025-12-04T10:11:57.8282228Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8282750Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8282753Z 2025-12-04T10:11:57.8282909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8283036Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8283126Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8283469Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8283631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8283688Z graph_break [] 2025-12-04T10:11:57.8283812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8283903Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8284022Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8284365Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8284426Z graph_break [] 2025-12-04T10:11:57.8284506Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8284797Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8284870Z Traceback (most recent call last): 2025-12-04T10:11:57.8285167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8285235Z method(*args, **kwargs) 2025-12-04T10:11:57.8285532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8285596Z method(*args, **kwargs) 2025-12-04T10:11:57.8285886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8285944Z with policy(): 2025-12-04T10:11:57.8286277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8286343Z raise RuntimeError(msg) 2025-12-04T10:11:57.8287161Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8287219Z 2025-12-04T10:11:57.8287343Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8287863Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8287870Z 2025-12-04T10:11:57.8288025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8288150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8288251Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8288605Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8288731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8288793Z graph_break [] 2025-12-04T10:11:57.8288950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8289043Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8289163Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8289508Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8289570Z graph_break [] 2025-12-04T10:11:57.8289691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8289776Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8289899Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8290270Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8290333Z graph_break [] 2025-12-04T10:11:57.8290823Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml - 2025-12-04T10:11:57.8290921Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8292229Z FAILED [0.4519s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8292237Z 2025-12-04T10:11:57.8292358Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8292879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8292883Z 2025-12-04T10:11:57.8293042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8293261Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8293378Z ================== 1 failed, 41 deselected, 2 rerun in 2.80s =================== 2025-12-04T10:11:57.8293436Z Got exit code 1 2025-12-04T10:11:57.8293504Z Retrying single test... 2025-12-04T10:11:57.8293764Z W1204 09:58:54.422000 57517 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8294188Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml 2025-12-04T10:11:57.8294285Z ============================= test session starts ============================== 2025-12-04T10:11:57.8294492Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8294559Z cachedir: .pytest_cache 2025-12-04T10:11:57.8294861Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8294937Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8295003Z configfile: pytest.ini 2025-12-04T10:11:57.8295318Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8295449Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8296063Z stepcurrent: skipping 41 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8296133Z Running 1 items in this shard 2025-12-04T10:11:57.8296137Z 2025-12-04T10:11:57.8296869Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:58:55.488246137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8296873Z 2025-12-04T10:11:57.8297171Z [W1204 09:59:04.552358582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8297175Z 2025-12-04T10:11:57.8297469Z [W1204 09:59:04.552594176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8297507Z 2025-12-04T10:11:57.8297805Z [W1204 09:59:04.558099260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8297809Z 2025-12-04T10:11:57.8298097Z [W1204 09:59:04.558620979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8298100Z 2025-12-04T10:11:57.8298390Z [W1204 09:59:04.558791692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8298394Z 2025-12-04T10:11:57.8298706Z [W1204 09:59:04.564106963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8298709Z 2025-12-04T10:11:57.8299006Z [W1204 09:59:04.564653142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8299012Z 2025-12-04T10:11:57.8299303Z [W1204 09:59:04.564825245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8299308Z 2025-12-04T10:11:57.8299391Z ('RERUN', {'yellow': True}) [10.9340s] [100%] 2025-12-04T10:11:57.8300125Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:05.738239605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8300162Z 2025-12-04T10:11:57.8300458Z [W1204 09:59:05.738778935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8300462Z 2025-12-04T10:11:57.8300747Z [W1204 09:59:05.738925917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8300786Z 2025-12-04T10:11:57.8301077Z [W1204 09:59:05.741844537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8301080Z 2025-12-04T10:11:57.8301367Z [W1204 09:59:05.742421617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8301370Z 2025-12-04T10:11:57.8301658Z [W1204 09:59:05.742563649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8301662Z 2025-12-04T10:11:57.8301949Z [W1204 09:59:05.746981535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8301953Z 2025-12-04T10:11:57.8302241Z [W1204 09:59:05.747440542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8302248Z 2025-12-04T10:11:57.8302535Z [W1204 09:59:05.747581835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8302571Z 2025-12-04T10:11:57.8302651Z ('RERUN', {'yellow': True}) [0.4176s] [100%] 2025-12-04T10:11:57.8303388Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:06.154333725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8303392Z 2025-12-04T10:11:57.8303680Z [W1204 09:59:06.154869114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8303683Z 2025-12-04T10:11:57.8303971Z [W1204 09:59:06.155015116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8304009Z 2025-12-04T10:11:57.8304298Z [W1204 09:59:06.157886805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8304301Z 2025-12-04T10:11:57.8304589Z [W1204 09:59:06.158442895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8304592Z 2025-12-04T10:11:57.8304879Z [W1204 09:59:06.158581137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8304883Z 2025-12-04T10:11:57.8305173Z [W1204 09:59:06.163033363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8305176Z 2025-12-04T10:11:57.8305460Z [W1204 09:59:06.163497971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8305466Z 2025-12-04T10:11:57.8305755Z [W1204 09:59:06.163637784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8305760Z 2025-12-04T10:11:57.8305820Z FAILED [0.4131s] [100%] 2025-12-04T10:11:57.8305824Z 2025-12-04T10:11:57.8305905Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8306202Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8306275Z Traceback (most recent call last): 2025-12-04T10:11:57.8306628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8306700Z method(*args, **kwargs) 2025-12-04T10:11:57.8306998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8307097Z method(*args, **kwargs) 2025-12-04T10:11:57.8307388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8307449Z with policy(): 2025-12-04T10:11:57.8307755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8307819Z raise RuntimeError(msg) 2025-12-04T10:11:57.8308641Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8308645Z 2025-12-04T10:11:57.8308781Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8309306Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8309314Z 2025-12-04T10:11:57.8309512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8309644Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8309744Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8310098Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8310233Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8310296Z graph_break [] 2025-12-04T10:11:57.8310420Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8311118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8311225Z if out == self.unknown_value: 2025-12-04T10:11:57.8311524Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8311600Z Traceback (most recent call last): 2025-12-04T10:11:57.8311898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8311967Z method(*args, **kwargs) 2025-12-04T10:11:57.8312259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8312322Z method(*args, **kwargs) 2025-12-04T10:11:57.8312616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8312678Z with policy(): 2025-12-04T10:11:57.8312975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8313044Z raise RuntimeError(msg) 2025-12-04T10:11:57.8313861Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8313865Z 2025-12-04T10:11:57.8314047Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8314566Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8314605Z 2025-12-04T10:11:57.8314768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8314891Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8314985Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8315338Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8315465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8315523Z graph_break [] 2025-12-04T10:11:57.8315663Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8316357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8316432Z if out == self.unknown_value: 2025-12-04T10:11:57.8316552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8316680Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8316807Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8317311Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8317374Z graph_break [] 2025-12-04T10:11:57.8317460Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8317755Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8317832Z Traceback (most recent call last): 2025-12-04T10:11:57.8318132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8318262Z method(*args, **kwargs) 2025-12-04T10:11:57.8318562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8318626Z method(*args, **kwargs) 2025-12-04T10:11:57.8318921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8318983Z with policy(): 2025-12-04T10:11:57.8319293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8319363Z raise RuntimeError(msg) 2025-12-04T10:11:57.8320223Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8320231Z 2025-12-04T10:11:57.8320359Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8320878Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8320882Z 2025-12-04T10:11:57.8321046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8321223Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8321316Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8321659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8321831Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8321889Z graph_break [] 2025-12-04T10:11:57.8322014Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8322704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8322777Z if out == self.unknown_value: 2025-12-04T10:11:57.8322899Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8322990Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8323116Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8323458Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8323521Z graph_break [] 2025-12-04T10:11:57.8323641Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8323775Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8323900Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8324239Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8324295Z graph_break [] 2025-12-04T10:11:57.8324798Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml - 2025-12-04T10:11:57.8324899Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8326199Z FAILED [0.4131s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8326241Z 2025-12-04T10:11:57.8326367Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8326895Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8326898Z 2025-12-04T10:11:57.8327056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8327160Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8327281Z ================== 1 failed, 57 deselected, 2 rerun in 11.79s ================== 2025-12-04T10:11:57.8327338Z Got exit code 1 2025-12-04T10:11:57.8327400Z Retrying single test... 2025-12-04T10:11:57.8327661Z W1204 09:59:12.814000 57710 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8328047Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml 2025-12-04T10:11:57.8328148Z ============================= test session starts ============================== 2025-12-04T10:11:57.8328391Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8328467Z cachedir: .pytest_cache 2025-12-04T10:11:57.8328773Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8328886Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8328953Z configfile: pytest.ini 2025-12-04T10:11:57.8329270Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8329400Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8329982Z stepcurrent: skipping 41 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8330054Z Running 1 items in this shard 2025-12-04T10:11:57.8330057Z 2025-12-04T10:11:57.8330796Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:13.879775141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8330803Z 2025-12-04T10:11:57.8331101Z [W1204 09:59:23.030145615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8331104Z 2025-12-04T10:11:57.8331435Z [W1204 09:59:23.030408309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8331439Z 2025-12-04T10:11:57.8331727Z [W1204 09:59:23.036151716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8331731Z 2025-12-04T10:11:57.8332018Z [W1204 09:59:23.036753345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8332021Z 2025-12-04T10:11:57.8332306Z [W1204 09:59:23.036927117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8332343Z 2025-12-04T10:11:57.8332630Z [W1204 09:59:23.042346509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8332635Z 2025-12-04T10:11:57.8332926Z [W1204 09:59:23.042915727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8332929Z 2025-12-04T10:11:57.8333213Z [W1204 09:59:23.043088610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8333216Z 2025-12-04T10:11:57.8333301Z ('RERUN', {'yellow': True}) [11.0206s] [100%] 2025-12-04T10:11:57.8334032Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:24.216615905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8334038Z 2025-12-04T10:11:57.8334330Z [W1204 09:59:24.217140242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8334333Z 2025-12-04T10:11:57.8334620Z [W1204 09:59:24.217280644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8334624Z 2025-12-04T10:11:57.8334914Z [W1204 09:59:24.220239479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8334917Z 2025-12-04T10:11:57.8335237Z [W1204 09:59:24.220807948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8335240Z 2025-12-04T10:11:57.8335529Z [W1204 09:59:24.220948740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8335567Z 2025-12-04T10:11:57.8335857Z [W1204 09:59:24.225550099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8335860Z 2025-12-04T10:11:57.8336148Z [W1204 09:59:24.226010016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8336151Z 2025-12-04T10:11:57.8336441Z [W1204 09:59:24.226148768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8336444Z 2025-12-04T10:11:57.8336523Z ('RERUN', {'yellow': True}) [0.4125s] [100%] 2025-12-04T10:11:57.8337255Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:24.626037983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8337260Z 2025-12-04T10:11:57.8337549Z [W1204 09:59:24.626556760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8337552Z 2025-12-04T10:11:57.8337891Z [W1204 09:59:24.626696423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8337894Z 2025-12-04T10:11:57.8338184Z [W1204 09:59:24.629556406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8338188Z 2025-12-04T10:11:57.8338486Z [W1204 09:59:24.630121644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8338489Z 2025-12-04T10:11:57.8338777Z [W1204 09:59:24.630265476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8338782Z 2025-12-04T10:11:57.8339133Z [W1204 09:59:24.634715413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8339139Z 2025-12-04T10:11:57.8339428Z [W1204 09:59:24.635164390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8339432Z 2025-12-04T10:11:57.8339719Z [W1204 09:59:24.635298712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8339722Z 2025-12-04T10:11:57.8339785Z FAILED [0.4034s] [100%] 2025-12-04T10:11:57.8339788Z 2025-12-04T10:11:57.8339870Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8340172Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8340245Z Traceback (most recent call last): 2025-12-04T10:11:57.8340551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8340626Z method(*args, **kwargs) 2025-12-04T10:11:57.8340922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8340990Z method(*args, **kwargs) 2025-12-04T10:11:57.8341281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8341338Z with policy(): 2025-12-04T10:11:57.8341678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8341748Z raise RuntimeError(msg) 2025-12-04T10:11:57.8342550Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8342593Z 2025-12-04T10:11:57.8342723Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8343243Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8343247Z 2025-12-04T10:11:57.8343410Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8343537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8343634Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8343982Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8344113Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8344174Z graph_break [] 2025-12-04T10:11:57.8344298Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8345024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8345099Z if out == self.unknown_value: 2025-12-04T10:11:57.8345392Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8345470Z Traceback (most recent call last): 2025-12-04T10:11:57.8345770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8345833Z method(*args, **kwargs) 2025-12-04T10:11:57.8346168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8346229Z method(*args, **kwargs) 2025-12-04T10:11:57.8346524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8346583Z with policy(): 2025-12-04T10:11:57.8346877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8346945Z raise RuntimeError(msg) 2025-12-04T10:11:57.8347766Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8347771Z 2025-12-04T10:11:57.8347900Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8348429Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8348432Z 2025-12-04T10:11:57.8348587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8348714Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8348811Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8349200Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8349327Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8349385Z graph_break [] 2025-12-04T10:11:57.8349544Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8350240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8350315Z if out == self.unknown_value: 2025-12-04T10:11:57.8350437Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8350527Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8350653Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8350997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8351054Z graph_break [] 2025-12-04T10:11:57.8351140Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8351430Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8351541Z Traceback (most recent call last): 2025-12-04T10:11:57.8351844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8351907Z method(*args, **kwargs) 2025-12-04T10:11:57.8352201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8352263Z method(*args, **kwargs) 2025-12-04T10:11:57.8352552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8352612Z with policy(): 2025-12-04T10:11:57.8352906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8353008Z raise RuntimeError(msg) 2025-12-04T10:11:57.8353824Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8353828Z 2025-12-04T10:11:57.8353955Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8354477Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8354481Z 2025-12-04T10:11:57.8354637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8354767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8354860Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8355207Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8355328Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8355384Z graph_break [] 2025-12-04T10:11:57.8355507Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8356229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8356301Z if out == self.unknown_value: 2025-12-04T10:11:57.8356422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8356548Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8356673Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8357013Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8357070Z graph_break [] 2025-12-04T10:11:57.8357195Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8357281Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8357400Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8357739Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8357796Z graph_break [] 2025-12-04T10:11:57.8358294Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml - 2025-12-04T10:11:57.8358395Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8360029Z FAILED [0.4034s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8360038Z 2025-12-04T10:11:57.8360192Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8360728Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8360770Z 2025-12-04T10:11:57.8360934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8361045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8361164Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ================== 2025-12-04T10:11:57.8361227Z Got exit code 1 2025-12-04T10:11:57.8361707Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8361955Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8362221Z W1204 09:59:31.282000 57903 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8362610Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml 2025-12-04T10:11:57.8362707Z ============================= test session starts ============================== 2025-12-04T10:11:57.8362915Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8362988Z cachedir: .pytest_cache 2025-12-04T10:11:57.8363296Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8363375Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8363474Z configfile: pytest.ini 2025-12-04T10:11:57.8363793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8363929Z collecting ... collected 58 items / 42 deselected / 16 selected 2025-12-04T10:11:57.8364051Z stepcurrent: skipping 42 already run items. 2025-12-04T10:11:57.8364123Z Running 16 items in this shard 2025-12-04T10:11:57.8364130Z 2025-12-04T10:11:57.8364633Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9473s] [ 6%] 2025-12-04T10:11:57.8365121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5434s] [ 6%] 2025-12-04T10:11:57.8365566Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5327s] [ 6%] 2025-12-04T10:11:57.8365570Z 2025-12-04T10:11:57.8365653Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8365949Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8366024Z Traceback (most recent call last): 2025-12-04T10:11:57.8366393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8366466Z method(*args, **kwargs) 2025-12-04T10:11:57.8366761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8366825Z method(*args, **kwargs) 2025-12-04T10:11:57.8367119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8367181Z with policy(): 2025-12-04T10:11:57.8367477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8367540Z raise RuntimeError(msg) 2025-12-04T10:11:57.8368343Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8368387Z 2025-12-04T10:11:57.8368514Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8369036Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8369040Z 2025-12-04T10:11:57.8369202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8369330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8369427Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8369979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8370107Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8370168Z graph_break [] 2025-12-04T10:11:57.8370454Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8370531Z Traceback (most recent call last): 2025-12-04T10:11:57.8370866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8370931Z method(*args, **kwargs) 2025-12-04T10:11:57.8371225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8371320Z method(*args, **kwargs) 2025-12-04T10:11:57.8371613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8371675Z with policy(): 2025-12-04T10:11:57.8371971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8372039Z raise RuntimeError(msg) 2025-12-04T10:11:57.8372848Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8372853Z 2025-12-04T10:11:57.8372976Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8373499Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8373506Z 2025-12-04T10:11:57.8373698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8373828Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8373919Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8374461Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8374591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8374649Z graph_break [] 2025-12-04T10:11:57.8374774Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8374863Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8375016Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8375554Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8375611Z graph_break [] 2025-12-04T10:11:57.8375708Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8376005Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8376079Z Traceback (most recent call last): 2025-12-04T10:11:57.8376380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8376442Z method(*args, **kwargs) 2025-12-04T10:11:57.8376734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8376798Z method(*args, **kwargs) 2025-12-04T10:11:57.8377087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8377147Z with policy(): 2025-12-04T10:11:57.8377440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8377505Z raise RuntimeError(msg) 2025-12-04T10:11:57.8378364Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8378400Z 2025-12-04T10:11:57.8378527Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8379051Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8379054Z 2025-12-04T10:11:57.8379208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8379334Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8379423Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8379963Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8380091Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8380150Z graph_break [] 2025-12-04T10:11:57.8380271Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8380360Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8380513Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8381050Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8381110Z graph_break [] 2025-12-04T10:11:57.8381233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8381325Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8381444Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8381978Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8382073Z graph_break [] 2025-12-04T10:11:57.8382574Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml - 2025-12-04T10:11:57.8382679Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8383974Z FAILED [0.5327s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8383982Z 2025-12-04T10:11:57.8384106Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8384627Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8384631Z 2025-12-04T10:11:57.8384788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8384891Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8385043Z ================== 1 failed, 42 deselected, 2 rerun in 3.05s =================== 2025-12-04T10:11:57.8385105Z Got exit code 1 2025-12-04T10:11:57.8385169Z Retrying single test... 2025-12-04T10:11:57.8385428Z W1204 09:59:40.950000 58092 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8385850Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml 2025-12-04T10:11:57.8385944Z ============================= test session starts ============================== 2025-12-04T10:11:57.8386157Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8386222Z cachedir: .pytest_cache 2025-12-04T10:11:57.8386529Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8386607Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8386675Z configfile: pytest.ini 2025-12-04T10:11:57.8386994Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8387121Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8387696Z stepcurrent: skipping 42 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8387808Z Running 1 items in this shard 2025-12-04T10:11:57.8387812Z 2025-12-04T10:11:57.8388542Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:59:42.539445427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8388546Z 2025-12-04T10:11:57.8388858Z [W1204 09:59:51.765851091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8388862Z 2025-12-04T10:11:57.8389160Z [W1204 09:59:51.766099266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8389199Z 2025-12-04T10:11:57.8389496Z [W1204 09:59:51.772779021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8389500Z 2025-12-04T10:11:57.8389787Z [W1204 09:59:51.773391321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8389790Z 2025-12-04T10:11:57.8390078Z [W1204 09:59:51.773575885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8390082Z 2025-12-04T10:11:57.8390368Z [W1204 09:59:51.778993598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8390372Z 2025-12-04T10:11:57.8390658Z [W1204 09:59:51.779515607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8390663Z 2025-12-04T10:11:57.8390957Z [W1204 09:59:51.779688380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8390960Z 2025-12-04T10:11:57.8391042Z ('RERUN', {'yellow': True}) [11.1748s] [100%] 2025-12-04T10:11:57.8391766Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:59:52.586513809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8391771Z 2025-12-04T10:11:57.8392094Z [W1204 09:59:52.587029988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8392098Z 2025-12-04T10:11:57.8392388Z [W1204 09:59:52.587169200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8392442Z 2025-12-04T10:11:57.8392732Z [W1204 09:59:52.590145731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8392735Z 2025-12-04T10:11:57.8393029Z [W1204 09:59:52.590610169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8393032Z 2025-12-04T10:11:57.8393319Z [W1204 09:59:52.590748591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8393322Z 2025-12-04T10:11:57.8393608Z [W1204 09:59:52.595332370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8393615Z 2025-12-04T10:11:57.8393901Z [W1204 09:59:52.595791688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8393905Z 2025-12-04T10:11:57.8394200Z [W1204 09:59:52.595929581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8394205Z 2025-12-04T10:11:57.8394325Z ('RERUN', {'yellow': True}) [0.5024s] [100%] 2025-12-04T10:11:57.8395050Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:59:53.087294841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8395055Z 2025-12-04T10:11:57.8395349Z [W1204 09:59:53.087812480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8395352Z 2025-12-04T10:11:57.8395639Z [W1204 09:59:53.087952533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8395676Z 2025-12-04T10:11:57.8395966Z [W1204 09:59:53.090904453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8395969Z 2025-12-04T10:11:57.8396256Z [W1204 09:59:53.091365691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8396260Z 2025-12-04T10:11:57.8396547Z [W1204 09:59:53.091502304 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8396550Z 2025-12-04T10:11:57.8396841Z [W1204 09:59:53.096014761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8396844Z 2025-12-04T10:11:57.8397130Z [W1204 09:59:53.096485460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8397138Z 2025-12-04T10:11:57.8397427Z [W1204 09:59:53.096622032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8397430Z 2025-12-04T10:11:57.8397490Z FAILED [0.5024s] [100%] 2025-12-04T10:11:57.8397495Z 2025-12-04T10:11:57.8397578Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8397873Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8397946Z Traceback (most recent call last): 2025-12-04T10:11:57.8398287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8398352Z method(*args, **kwargs) 2025-12-04T10:11:57.8398652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8398749Z method(*args, **kwargs) 2025-12-04T10:11:57.8399039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8399098Z with policy(): 2025-12-04T10:11:57.8399395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8399464Z raise RuntimeError(msg) 2025-12-04T10:11:57.8400297Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8400302Z 2025-12-04T10:11:57.8400429Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8400950Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8400957Z 2025-12-04T10:11:57.8401114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8401285Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8401380Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8401924Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8402055Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8402113Z graph_break [] 2025-12-04T10:11:57.8402241Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8402933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8403113Z if out == self.unknown_value: 2025-12-04T10:11:57.8403412Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8403486Z Traceback (most recent call last): 2025-12-04T10:11:57.8403788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8403858Z method(*args, **kwargs) 2025-12-04T10:11:57.8404152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8404216Z method(*args, **kwargs) 2025-12-04T10:11:57.8404504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8404569Z with policy(): 2025-12-04T10:11:57.8404864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8404931Z raise RuntimeError(msg) 2025-12-04T10:11:57.8405740Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8405745Z 2025-12-04T10:11:57.8405906Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8406428Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8406466Z 2025-12-04T10:11:57.8406624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8406758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8406859Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8407399Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8407527Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8407590Z graph_break [] 2025-12-04T10:11:57.8407712Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8408413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8408484Z if out == self.unknown_value: 2025-12-04T10:11:57.8408645Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8408736Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8408860Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8409409Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8409468Z graph_break [] 2025-12-04T10:11:57.8409550Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8409843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8409950Z Traceback (most recent call last): 2025-12-04T10:11:57.8410252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8410371Z method(*args, **kwargs) 2025-12-04T10:11:57.8410699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8414874Z method(*args, **kwargs) 2025-12-04T10:11:57.8415254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8415324Z with policy(): 2025-12-04T10:11:57.8415650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8415722Z raise RuntimeError(msg) 2025-12-04T10:11:57.8416554Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8416568Z 2025-12-04T10:11:57.8416703Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8417453Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8417458Z 2025-12-04T10:11:57.8417740Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8417890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8417993Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8418540Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8418734Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8418799Z graph_break [] 2025-12-04T10:11:57.8418927Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8419633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8419707Z if out == self.unknown_value: 2025-12-04T10:11:57.8419832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8419933Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8420063Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8420662Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8420726Z graph_break [] 2025-12-04T10:11:57.8420848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8420940Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8421073Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8421610Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8421673Z graph_break [] 2025-12-04T10:11:57.8422179Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml - 2025-12-04T10:11:57.8422332Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8423639Z FAILED [0.5024s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8423643Z 2025-12-04T10:11:57.8423783Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8424309Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8424316Z 2025-12-04T10:11:57.8424482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8424591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8424707Z ================== 1 failed, 57 deselected, 2 rerun in 12.20s ================== 2025-12-04T10:11:57.8424769Z Got exit code 1 2025-12-04T10:11:57.8424834Z Retrying single test... 2025-12-04T10:11:57.8425140Z W1204 09:59:59.747000 58286 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8425536Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml 2025-12-04T10:11:57.8425631Z ============================= test session starts ============================== 2025-12-04T10:11:57.8425882Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8425948Z cachedir: .pytest_cache 2025-12-04T10:11:57.8426257Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8426339Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8426404Z configfile: pytest.ini 2025-12-04T10:11:57.8426722Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8426860Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8427437Z stepcurrent: skipping 42 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8427513Z Running 1 items in this shard 2025-12-04T10:11:57.8427517Z 2025-12-04T10:11:57.8428287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:00:01.344704453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8428292Z 2025-12-04T10:11:57.8428592Z [W1204 10:00:10.280196486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8428596Z 2025-12-04T10:11:57.8428887Z [W1204 10:00:10.280469020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8428891Z 2025-12-04T10:11:57.8429177Z [W1204 10:00:10.286489393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8429220Z 2025-12-04T10:11:57.8429508Z [W1204 10:00:10.287096063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8429511Z 2025-12-04T10:11:57.8429802Z [W1204 10:00:10.287271195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8429806Z 2025-12-04T10:11:57.8430094Z [W1204 10:00:10.292758731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8430098Z 2025-12-04T10:11:57.8430384Z [W1204 10:00:10.293290469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8430387Z 2025-12-04T10:11:57.8430675Z [W1204 10:00:10.293449542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8430680Z 2025-12-04T10:11:57.8430762Z ('RERUN', {'yellow': True}) [10.8914s] [100%] 2025-12-04T10:11:57.8431487Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:00:11.096759827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8431491Z 2025-12-04T10:11:57.8431796Z [W1204 10:00:11.097268604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8431800Z 2025-12-04T10:11:57.8432131Z [W1204 10:00:11.097411647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8432135Z 2025-12-04T10:11:57.8432423Z [W1204 10:00:11.100366272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8432459Z 2025-12-04T10:11:57.8432749Z [W1204 10:00:11.100820640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8432752Z 2025-12-04T10:11:57.8433040Z [W1204 10:00:11.100962242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8433043Z 2025-12-04T10:11:57.8433327Z [W1204 10:00:11.105480752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8433330Z 2025-12-04T10:11:57.8433618Z [W1204 10:00:11.105939180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8433621Z 2025-12-04T10:11:57.8433909Z [W1204 10:00:11.106079422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8433912Z 2025-12-04T10:11:57.8433994Z ('RERUN', {'yellow': True}) [0.4999s] [100%] 2025-12-04T10:11:57.8434753Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:00:11.594781890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8434757Z 2025-12-04T10:11:57.8435048Z [W1204 10:00:11.595299538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8435051Z 2025-12-04T10:11:57.8435339Z [W1204 10:00:11.595438960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8435342Z 2025-12-04T10:11:57.8435626Z [W1204 10:00:11.598361456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8435633Z 2025-12-04T10:11:57.8435916Z [W1204 10:00:11.598814843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8435953Z 2025-12-04T10:11:57.8436241Z [W1204 10:00:11.598952105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8436244Z 2025-12-04T10:11:57.8436531Z [W1204 10:00:11.603521827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8436534Z 2025-12-04T10:11:57.8436821Z [W1204 10:00:11.603982964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8436824Z 2025-12-04T10:11:57.8437112Z [W1204 10:00:11.604123236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8437115Z 2025-12-04T10:11:57.8437178Z FAILED [0.4938s] [100%] 2025-12-04T10:11:57.8437183Z 2025-12-04T10:11:57.8437270Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8437567Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8437642Z Traceback (most recent call last): 2025-12-04T10:11:57.8437952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8438017Z method(*args, **kwargs) 2025-12-04T10:11:57.8438311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8438413Z method(*args, **kwargs) 2025-12-04T10:11:57.8438702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8438764Z with policy(): 2025-12-04T10:11:57.8439090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8439156Z raise RuntimeError(msg) 2025-12-04T10:11:57.8440024Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8440029Z 2025-12-04T10:11:57.8440163Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8440694Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8440698Z 2025-12-04T10:11:57.8440859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8440993Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8441089Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8441702Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8441838Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8441899Z graph_break [] 2025-12-04T10:11:57.8442023Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8442727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8442798Z if out == self.unknown_value: 2025-12-04T10:11:57.8443122Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8443196Z Traceback (most recent call last): 2025-12-04T10:11:57.8443502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8443577Z method(*args, **kwargs) 2025-12-04T10:11:57.8443868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8443932Z method(*args, **kwargs) 2025-12-04T10:11:57.8444220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8444279Z with policy(): 2025-12-04T10:11:57.8444577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8444643Z raise RuntimeError(msg) 2025-12-04T10:11:57.8445453Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8445462Z 2025-12-04T10:11:57.8445588Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8446153Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8446157Z 2025-12-04T10:11:57.8446321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8446447Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8446579Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8447124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8447252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8447319Z graph_break [] 2025-12-04T10:11:57.8447439Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8448132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8448204Z if out == self.unknown_value: 2025-12-04T10:11:57.8448327Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8448421Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8448544Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8449117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8449176Z graph_break [] 2025-12-04T10:11:57.8449260Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8449552Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8449623Z Traceback (most recent call last): 2025-12-04T10:11:57.8449919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8450018Z method(*args, **kwargs) 2025-12-04T10:11:57.8450306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8450376Z method(*args, **kwargs) 2025-12-04T10:11:57.8450667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8450723Z with policy(): 2025-12-04T10:11:57.8451015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8451080Z raise RuntimeError(msg) 2025-12-04T10:11:57.8451892Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8451899Z 2025-12-04T10:11:57.8452023Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8452545Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8452552Z 2025-12-04T10:11:57.8452707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8452827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8452927Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8453505Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8453660Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8453722Z graph_break [] 2025-12-04T10:11:57.8453842Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8454531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8454598Z if out == self.unknown_value: 2025-12-04T10:11:57.8454718Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8454810Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8454934Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8455475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8455536Z graph_break [] 2025-12-04T10:11:57.8455657Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8455801Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8455925Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8456459Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8456519Z graph_break [] 2025-12-04T10:11:57.8457011Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml - 2025-12-04T10:11:57.8457113Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8458437Z FAILED [0.4938s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8458442Z 2025-12-04T10:11:57.8458565Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8459080Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8459083Z 2025-12-04T10:11:57.8459241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8459346Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8459459Z ================== 1 failed, 57 deselected, 2 rerun in 11.91s ================== 2025-12-04T10:11:57.8459518Z Got exit code 1 2025-12-04T10:11:57.8459993Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8460233Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8460546Z W1204 10:00:18.180000 58480 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8460938Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml 2025-12-04T10:11:57.8461073Z ============================= test session starts ============================== 2025-12-04T10:11:57.8461283Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8461350Z cachedir: .pytest_cache 2025-12-04T10:11:57.8461666Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8461744Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8461812Z configfile: pytest.ini 2025-12-04T10:11:57.8462122Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8462249Z collecting ... collected 58 items / 43 deselected / 15 selected 2025-12-04T10:11:57.8462340Z stepcurrent: skipping 43 already run items. 2025-12-04T10:11:57.8462408Z Running 15 items in this shard 2025-12-04T10:11:57.8462411Z 2025-12-04T10:11:57.8462918Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8663s] [ 6%] 2025-12-04T10:11:57.8463450Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4743s] [ 6%] 2025-12-04T10:11:57.8463895Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4659s] [ 6%] 2025-12-04T10:11:57.8463899Z 2025-12-04T10:11:57.8463985Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8464280Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8464364Z Traceback (most recent call last): 2025-12-04T10:11:57.8464673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8464777Z method(*args, **kwargs) 2025-12-04T10:11:57.8465075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8465136Z method(*args, **kwargs) 2025-12-04T10:11:57.8465423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8465484Z with policy(): 2025-12-04T10:11:57.8465778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8465845Z raise RuntimeError(msg) 2025-12-04T10:11:57.8466650Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8466656Z 2025-12-04T10:11:57.8466787Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8467306Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8467311Z 2025-12-04T10:11:57.8467467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8467634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8467729Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8468089Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8468272Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8468330Z graph_break [] 2025-12-04T10:11:57.8468624Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8468698Z Traceback (most recent call last): 2025-12-04T10:11:57.8468995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8469065Z method(*args, **kwargs) 2025-12-04T10:11:57.8469355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8469422Z method(*args, **kwargs) 2025-12-04T10:11:57.8469708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8469768Z with policy(): 2025-12-04T10:11:57.8470064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8470129Z raise RuntimeError(msg) 2025-12-04T10:11:57.8470990Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8470996Z 2025-12-04T10:11:57.8471124Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8471645Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8471653Z 2025-12-04T10:11:57.8471807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8471971Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8472065Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8472414Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8472537Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8472600Z graph_break [] 2025-12-04T10:11:57.8472722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8472820Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8472938Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8473278Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8473342Z graph_break [] 2025-12-04T10:11:57.8473434Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8473730Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8473807Z Traceback (most recent call last): 2025-12-04T10:11:57.8474107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8474178Z method(*args, **kwargs) 2025-12-04T10:11:57.8474503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8474567Z method(*args, **kwargs) 2025-12-04T10:11:57.8474856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8474947Z with policy(): 2025-12-04T10:11:57.8475238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8475304Z raise RuntimeError(msg) 2025-12-04T10:11:57.8476127Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8476131Z 2025-12-04T10:11:57.8476257Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8476774Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8476778Z 2025-12-04T10:11:57.8476935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8477059Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8477146Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8477523Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8477646Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8477705Z graph_break [] 2025-12-04T10:11:57.8477824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8477912Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8478036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8478379Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8478472Z graph_break [] 2025-12-04T10:11:57.8478596Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8478682Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8478801Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8479137Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8479196Z graph_break [] 2025-12-04T10:11:57.8479685Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml - 2025-12-04T10:11:57.8479784Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8481133Z FAILED [0.4659s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8481141Z 2025-12-04T10:11:57.8481264Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8481823Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8481828Z 2025-12-04T10:11:57.8481983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8482088Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8482243Z ================== 1 failed, 43 deselected, 2 rerun in 2.83s =================== 2025-12-04T10:11:57.8482300Z Got exit code 1 2025-12-04T10:11:57.8482363Z Retrying single test... 2025-12-04T10:11:57.8482631Z W1204 10:00:27.860000 58668 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8483011Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml 2025-12-04T10:11:57.8483107Z ============================= test session starts ============================== 2025-12-04T10:11:57.8483313Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8483377Z cachedir: .pytest_cache 2025-12-04T10:11:57.8483684Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8483762Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8483827Z configfile: pytest.ini 2025-12-04T10:11:57.8484174Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8484305Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8484877Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8484947Z Running 1 items in this shard 2025-12-04T10:11:57.8484951Z 2025-12-04T10:11:57.8485689Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:29.938355325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8485739Z 2025-12-04T10:11:57.8486040Z [W1204 10:00:37.879377728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8486045Z 2025-12-04T10:11:57.8486338Z [W1204 10:00:37.879628212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8486342Z 2025-12-04T10:11:57.8486626Z [W1204 10:00:37.885654335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8486630Z 2025-12-04T10:11:57.8486915Z [W1204 10:00:37.886245885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8486923Z 2025-12-04T10:11:57.8487208Z [W1204 10:00:37.886417048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8487214Z 2025-12-04T10:11:57.8487498Z [W1204 10:00:37.892095495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8487501Z 2025-12-04T10:11:57.8487789Z [W1204 10:00:37.892646885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8487793Z 2025-12-04T10:11:57.8488079Z [W1204 10:00:37.892810487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8488082Z 2025-12-04T10:11:57.8488164Z ('RERUN', {'yellow': True}) [10.8210s] [100%] 2025-12-04T10:11:57.8488926Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:39.094988004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8488963Z 2025-12-04T10:11:57.8489254Z [W1204 10:00:39.095513644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8489258Z 2025-12-04T10:11:57.8489545Z [W1204 10:00:39.095655816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8489549Z 2025-12-04T10:11:57.8489833Z [W1204 10:00:39.098611577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8489839Z 2025-12-04T10:11:57.8490127Z [W1204 10:00:39.099169856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8490130Z 2025-12-04T10:11:57.8490418Z [W1204 10:00:39.099310009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8490423Z 2025-12-04T10:11:57.8490712Z [W1204 10:00:39.103891207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8490716Z 2025-12-04T10:11:57.8491034Z [W1204 10:00:39.104365085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8491037Z 2025-12-04T10:11:57.8491328Z [W1204 10:00:39.104502748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8491331Z 2025-12-04T10:11:57.8491409Z ('RERUN', {'yellow': True}) [0.4479s] [100%] 2025-12-04T10:11:57.8492132Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:39.540359883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8492170Z 2025-12-04T10:11:57.8492458Z [W1204 10:00:39.540910543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8492461Z 2025-12-04T10:11:57.8492752Z [W1204 10:00:39.541047715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8492755Z 2025-12-04T10:11:57.8493040Z [W1204 10:00:39.543910795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8493043Z 2025-12-04T10:11:57.8493334Z [W1204 10:00:39.544462234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8493344Z 2025-12-04T10:11:57.8493630Z [W1204 10:00:39.544599467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8493635Z 2025-12-04T10:11:57.8493920Z [W1204 10:00:39.549075024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8493923Z 2025-12-04T10:11:57.8494212Z [W1204 10:00:39.549522841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8494215Z 2025-12-04T10:11:57.8494501Z [W1204 10:00:39.549656574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8494504Z 2025-12-04T10:11:57.8494565Z FAILED [0.4432s] [100%] 2025-12-04T10:11:57.8494569Z 2025-12-04T10:11:57.8494701Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8494997Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8495071Z Traceback (most recent call last): 2025-12-04T10:11:57.8495408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8495477Z method(*args, **kwargs) 2025-12-04T10:11:57.8495769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8495833Z method(*args, **kwargs) 2025-12-04T10:11:57.8496130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8496187Z with policy(): 2025-12-04T10:11:57.8496481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8496545Z raise RuntimeError(msg) 2025-12-04T10:11:57.8497348Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8497355Z 2025-12-04T10:11:57.8497516Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8498036Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8498039Z 2025-12-04T10:11:57.8498196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8498326Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8498418Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8498767Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8498928Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8498991Z graph_break [] 2025-12-04T10:11:57.8499114Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8499807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8499880Z if out == self.unknown_value: 2025-12-04T10:11:57.8500174Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8500249Z Traceback (most recent call last): 2025-12-04T10:11:57.8500543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8500618Z method(*args, **kwargs) 2025-12-04T10:11:57.8500915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8500977Z method(*args, **kwargs) 2025-12-04T10:11:57.8501266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8501327Z with policy(): 2025-12-04T10:11:57.8501617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8501684Z raise RuntimeError(msg) 2025-12-04T10:11:57.8502538Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8502576Z 2025-12-04T10:11:57.8502703Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8503229Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8503233Z 2025-12-04T10:11:57.8503386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8503514Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8503606Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8503951Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8504078Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8504135Z graph_break [] 2025-12-04T10:11:57.8504264Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8504987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8505072Z if out == self.unknown_value: 2025-12-04T10:11:57.8505194Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8505283Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8505409Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8505748Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8505808Z graph_break [] 2025-12-04T10:11:57.8505892Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8506217Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8506293Z Traceback (most recent call last): 2025-12-04T10:11:57.8506587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8506650Z method(*args, **kwargs) 2025-12-04T10:11:57.8506940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8507001Z method(*args, **kwargs) 2025-12-04T10:11:57.8507297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8507356Z with policy(): 2025-12-04T10:11:57.8507649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8507719Z raise RuntimeError(msg) 2025-12-04T10:11:57.8508541Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8508545Z 2025-12-04T10:11:57.8508673Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8509226Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8509230Z 2025-12-04T10:11:57.8509389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8509509Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8509631Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8509981Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8510101Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8510157Z graph_break [] 2025-12-04T10:11:57.8510278Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8510964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8511033Z if out == self.unknown_value: 2025-12-04T10:11:57.8511151Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8511240Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8511362Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8511737Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8511798Z graph_break [] 2025-12-04T10:11:57.8511917Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8512002Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8512122Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8512460Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8512515Z graph_break [] 2025-12-04T10:11:57.8513001Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml - 2025-12-04T10:11:57.8513213Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8514513Z FAILED [0.4432s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8514521Z 2025-12-04T10:11:57.8514644Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8515167Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8515174Z 2025-12-04T10:11:57.8515326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8515429Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8515548Z ================== 1 failed, 57 deselected, 2 rerun in 11.74s ================== 2025-12-04T10:11:57.8515606Z Got exit code 1 2025-12-04T10:11:57.8515671Z Retrying single test... 2025-12-04T10:11:57.8515969Z W1204 10:00:46.228000 58861 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8516354Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml 2025-12-04T10:11:57.8516448Z ============================= test session starts ============================== 2025-12-04T10:11:57.8516688Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8516757Z cachedir: .pytest_cache 2025-12-04T10:11:57.8517276Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8517354Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8517424Z configfile: pytest.ini 2025-12-04T10:11:57.8517740Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8517868Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8518442Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8518513Z Running 1 items in this shard 2025-12-04T10:11:57.8518519Z 2025-12-04T10:11:57.8519321Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:47.324713450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8519328Z 2025-12-04T10:11:57.8519630Z [W1204 10:00:56.362970795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8519633Z 2025-12-04T10:11:57.8519966Z [W1204 10:00:56.363228080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8519970Z 2025-12-04T10:11:57.8520258Z [W1204 10:00:56.369128890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8520263Z 2025-12-04T10:11:57.8520555Z [W1204 10:00:56.369708280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8520630Z 2025-12-04T10:11:57.8520922Z [W1204 10:00:56.369890383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8520925Z 2025-12-04T10:11:57.8521212Z [W1204 10:00:56.375390567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8521218Z 2025-12-04T10:11:57.8521504Z [W1204 10:00:56.375933896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8521507Z 2025-12-04T10:11:57.8521792Z [W1204 10:00:56.376088048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8521796Z 2025-12-04T10:11:57.8521879Z ('RERUN', {'yellow': True}) [10.9390s] [100%] 2025-12-04T10:11:57.8522607Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:57.585092379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8522610Z 2025-12-04T10:11:57.8522908Z [W1204 10:00:57.585628447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8522913Z 2025-12-04T10:11:57.8523252Z [W1204 10:00:57.585769559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8523256Z 2025-12-04T10:11:57.8523546Z [W1204 10:00:57.588701926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8523550Z 2025-12-04T10:11:57.8523880Z [W1204 10:00:57.589269435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8523885Z 2025-12-04T10:11:57.8524174Z [W1204 10:00:57.589410168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8524177Z 2025-12-04T10:11:57.8524462Z [W1204 10:00:57.594006280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8524465Z 2025-12-04T10:11:57.8524749Z [W1204 10:00:57.594473348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8524754Z 2025-12-04T10:11:57.8525043Z [W1204 10:00:57.594610300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8525046Z 2025-12-04T10:11:57.8525122Z ('RERUN', {'yellow': True}) [0.4529s] [100%] 2025-12-04T10:11:57.8525889Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:58.035982556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8525893Z 2025-12-04T10:11:57.8526181Z [W1204 10:00:58.036526614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8526184Z 2025-12-04T10:11:57.8526475Z [W1204 10:00:58.036667017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8526480Z 2025-12-04T10:11:57.8526764Z [W1204 10:00:58.039550743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8526767Z 2025-12-04T10:11:57.8527055Z [W1204 10:00:58.040128392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8527101Z 2025-12-04T10:11:57.8527393Z [W1204 10:00:58.040272584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8527396Z 2025-12-04T10:11:57.8527681Z [W1204 10:00:58.044736425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8527687Z 2025-12-04T10:11:57.8527970Z [W1204 10:00:58.045188453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8527973Z 2025-12-04T10:11:57.8528258Z [W1204 10:00:58.045324695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8528262Z 2025-12-04T10:11:57.8528325Z FAILED [0.4477s] [100%] 2025-12-04T10:11:57.8528330Z 2025-12-04T10:11:57.8528417Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8528711Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8528784Z Traceback (most recent call last): 2025-12-04T10:11:57.8529087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8529160Z method(*args, **kwargs) 2025-12-04T10:11:57.8529452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8529558Z method(*args, **kwargs) 2025-12-04T10:11:57.8529849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8529907Z with policy(): 2025-12-04T10:11:57.8530200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8530302Z raise RuntimeError(msg) 2025-12-04T10:11:57.8531103Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8531109Z 2025-12-04T10:11:57.8531238Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8531762Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8531765Z 2025-12-04T10:11:57.8531924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8532051Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8532148Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8532529Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8532658Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8532718Z graph_break [] 2025-12-04T10:11:57.8532838Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8533529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8533601Z if out == self.unknown_value: 2025-12-04T10:11:57.8533892Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8534007Z Traceback (most recent call last): 2025-12-04T10:11:57.8534302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8534365Z method(*args, **kwargs) 2025-12-04T10:11:57.8534659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8534721Z method(*args, **kwargs) 2025-12-04T10:11:57.8535021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8535083Z with policy(): 2025-12-04T10:11:57.8535371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8535436Z raise RuntimeError(msg) 2025-12-04T10:11:57.8536264Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8536269Z 2025-12-04T10:11:57.8536395Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8536913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8536917Z 2025-12-04T10:11:57.8537107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8537235Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8537325Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8537709Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8537854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8537914Z graph_break [] 2025-12-04T10:11:57.8538039Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8538728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8538799Z if out == self.unknown_value: 2025-12-04T10:11:57.8538921Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8539009Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8539134Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8539482Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8539574Z graph_break [] 2025-12-04T10:11:57.8539663Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8539954Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8540030Z Traceback (most recent call last): 2025-12-04T10:11:57.8540327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8540390Z method(*args, **kwargs) 2025-12-04T10:11:57.8540683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8540744Z method(*args, **kwargs) 2025-12-04T10:11:57.8541067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8541126Z with policy(): 2025-12-04T10:11:57.8541418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8541486Z raise RuntimeError(msg) 2025-12-04T10:11:57.8542306Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8542310Z 2025-12-04T10:11:57.8542438Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8542957Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8542964Z 2025-12-04T10:11:57.8543122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8543245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8543333Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8543676Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8543831Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8543891Z graph_break [] 2025-12-04T10:11:57.8544014Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8544697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8544803Z if out == self.unknown_value: 2025-12-04T10:11:57.8544923Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8545013Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8545137Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8545474Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8545532Z graph_break [] 2025-12-04T10:11:57.8545655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8545740Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8545862Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8546201Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8546312Z graph_break [] 2025-12-04T10:11:57.8546806Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml - 2025-12-04T10:11:57.8546907Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8548213Z FAILED [0.4477s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8548257Z 2025-12-04T10:11:57.8548380Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8548903Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8548906Z 2025-12-04T10:11:57.8549059Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8549164Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8549282Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ================== 2025-12-04T10:11:57.8549340Z Got exit code 1 2025-12-04T10:11:57.8549812Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8550057Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8550316Z W1204 10:01:04.690000 59054 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8550702Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml 2025-12-04T10:11:57.8550796Z ============================= test session starts ============================== 2025-12-04T10:11:57.8551036Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8551102Z cachedir: .pytest_cache 2025-12-04T10:11:57.8551405Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8551529Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8551594Z configfile: pytest.ini 2025-12-04T10:11:57.8551908Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8552037Z collecting ... collected 58 items / 44 deselected / 14 selected 2025-12-04T10:11:57.8552123Z stepcurrent: skipping 44 already run items. 2025-12-04T10:11:57.8552197Z Running 14 items in this shard 2025-12-04T10:11:57.8552203Z 2025-12-04T10:11:57.8552700Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8805s] [ 7%] 2025-12-04T10:11:57.8553182Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4589s] [ 7%] 2025-12-04T10:11:57.8553622Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4604s] [ 7%] 2025-12-04T10:11:57.8553628Z 2025-12-04T10:11:57.8553751Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8554049Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8554122Z Traceback (most recent call last): 2025-12-04T10:11:57.8554428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8554496Z method(*args, **kwargs) 2025-12-04T10:11:57.8554787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8554850Z method(*args, **kwargs) 2025-12-04T10:11:57.8555137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8555232Z with policy(): 2025-12-04T10:11:57.8555528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8555596Z raise RuntimeError(msg) 2025-12-04T10:11:57.8556387Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8556393Z 2025-12-04T10:11:57.8556519Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8557035Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8557042Z 2025-12-04T10:11:57.8557200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8557326Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8557421Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8557766Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8557892Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8557952Z graph_break [] 2025-12-04T10:11:57.8558273Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8558348Z Traceback (most recent call last): 2025-12-04T10:11:57.8558645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8558740Z method(*args, **kwargs) 2025-12-04T10:11:57.8559031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8559092Z method(*args, **kwargs) 2025-12-04T10:11:57.8559377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8559437Z with policy(): 2025-12-04T10:11:57.8559725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8559792Z raise RuntimeError(msg) 2025-12-04T10:11:57.8560662Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8560671Z 2025-12-04T10:11:57.8560795Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8561354Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8561358Z 2025-12-04T10:11:57.8561513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8561639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8561730Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8562071Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8562196Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8562288Z graph_break [] 2025-12-04T10:11:57.8562410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8562502Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8562624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8562979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8563040Z graph_break [] 2025-12-04T10:11:57.8563125Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8563416Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8563488Z Traceback (most recent call last): 2025-12-04T10:11:57.8563787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8563857Z method(*args, **kwargs) 2025-12-04T10:11:57.8564144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8564211Z method(*args, **kwargs) 2025-12-04T10:11:57.8564496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8564555Z with policy(): 2025-12-04T10:11:57.8564846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8564910Z raise RuntimeError(msg) 2025-12-04T10:11:57.8565764Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8565801Z 2025-12-04T10:11:57.8565928Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8566448Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8566452Z 2025-12-04T10:11:57.8566608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8566732Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8566830Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8567183Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8567311Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8567374Z graph_break [] 2025-12-04T10:11:57.8567493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8567582Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8567745Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8568084Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8568145Z graph_break [] 2025-12-04T10:11:57.8568265Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8568354Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8568472Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8568805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8568903Z graph_break [] 2025-12-04T10:11:57.8569390Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml - 2025-12-04T10:11:57.8569491Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8570771Z FAILED [0.4604s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8570776Z 2025-12-04T10:11:57.8570903Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8571422Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8571426Z 2025-12-04T10:11:57.8571581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8571686Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8571800Z ================== 1 failed, 44 deselected, 2 rerun in 2.82s =================== 2025-12-04T10:11:57.8571860Z Got exit code 1 2025-12-04T10:11:57.8571957Z Retrying single test... 2025-12-04T10:11:57.8572231Z W1204 10:01:14.338000 59235 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8572620Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml 2025-12-04T10:11:57.8572767Z ============================= test session starts ============================== 2025-12-04T10:11:57.8572983Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8573051Z cachedir: .pytest_cache 2025-12-04T10:11:57.8573362Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8573441Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8573503Z configfile: pytest.ini 2025-12-04T10:11:57.8573823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8573956Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8574532Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8574608Z Running 1 items in this shard 2025-12-04T10:11:57.8574612Z 2025-12-04T10:11:57.8575375Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:15.390594130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8575381Z 2025-12-04T10:11:57.8575685Z [W1204 10:01:24.553909126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8575692Z 2025-12-04T10:11:57.8575985Z [W1204 10:01:24.554198271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8575988Z 2025-12-04T10:11:57.8576275Z [W1204 10:01:24.559967450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8576313Z 2025-12-04T10:11:57.8576607Z [W1204 10:01:24.560591811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8576610Z 2025-12-04T10:11:57.8576898Z [W1204 10:01:24.560778714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8576901Z 2025-12-04T10:11:57.8577192Z [W1204 10:01:24.566218548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8577197Z 2025-12-04T10:11:57.8577482Z [W1204 10:01:24.566763667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8577486Z 2025-12-04T10:11:57.8577775Z [W1204 10:01:24.566939020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8577780Z 2025-12-04T10:11:57.8577861Z ('RERUN', {'yellow': True}) [11.0205s] [100%] 2025-12-04T10:11:57.8578585Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:25.738889840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8578589Z 2025-12-04T10:11:57.8578874Z [W1204 10:01:25.739449529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8578914Z 2025-12-04T10:11:57.8579200Z [W1204 10:01:25.739592461 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8579206Z 2025-12-04T10:11:57.8579487Z [W1204 10:01:25.742609773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8579525Z 2025-12-04T10:11:57.8579811Z [W1204 10:01:25.743182543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8579814Z 2025-12-04T10:11:57.8580115Z [W1204 10:01:25.743323826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8580118Z 2025-12-04T10:11:57.8580402Z [W1204 10:01:25.747902335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8580405Z 2025-12-04T10:11:57.8580693Z [W1204 10:01:25.748373163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8580696Z 2025-12-04T10:11:57.8580979Z [W1204 10:01:25.748513095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8580986Z 2025-12-04T10:11:57.8581066Z ('RERUN', {'yellow': True}) [0.4208s] [100%] 2025-12-04T10:11:57.8581818Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:26.157378251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8581822Z 2025-12-04T10:11:57.8582114Z [W1204 10:01:26.157916600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8582117Z 2025-12-04T10:11:57.8582416Z [W1204 10:01:26.158059193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8582419Z 2025-12-04T10:11:57.8582709Z [W1204 10:01:26.161015894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8582764Z 2025-12-04T10:11:57.8583052Z [W1204 10:01:26.161565463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8583057Z 2025-12-04T10:11:57.8583343Z [W1204 10:01:26.161704656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8583346Z 2025-12-04T10:11:57.8583634Z [W1204 10:01:26.166186573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8583638Z 2025-12-04T10:11:57.8583924Z [W1204 10:01:26.166649351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8583927Z 2025-12-04T10:11:57.8584216Z [W1204 10:01:26.166785503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8584223Z 2025-12-04T10:11:57.8584284Z FAILED [0.4136s] [100%] 2025-12-04T10:11:57.8584287Z 2025-12-04T10:11:57.8584372Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8584673Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8584747Z Traceback (most recent call last): 2025-12-04T10:11:57.8585061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8585124Z method(*args, **kwargs) 2025-12-04T10:11:57.8585456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8585523Z method(*args, **kwargs) 2025-12-04T10:11:57.8585811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8585908Z with policy(): 2025-12-04T10:11:57.8586202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8586267Z raise RuntimeError(msg) 2025-12-04T10:11:57.8587072Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8587077Z 2025-12-04T10:11:57.8587206Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8587729Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8587735Z 2025-12-04T10:11:57.8587894Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8588032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8588166Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8588519Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8588648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8588704Z graph_break [] 2025-12-04T10:11:57.8588827Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8589524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8589629Z if out == self.unknown_value: 2025-12-04T10:11:57.8589925Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8589998Z Traceback (most recent call last): 2025-12-04T10:11:57.8590296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8590362Z method(*args, **kwargs) 2025-12-04T10:11:57.8590650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8590713Z method(*args, **kwargs) 2025-12-04T10:11:57.8591003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8591060Z with policy(): 2025-12-04T10:11:57.8591369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8591439Z raise RuntimeError(msg) 2025-12-04T10:11:57.8592251Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8592258Z 2025-12-04T10:11:57.8592383Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8592937Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8592941Z 2025-12-04T10:11:57.8593106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8593230Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8593361Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8593714Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8593840Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8593900Z graph_break [] 2025-12-04T10:11:57.8594021Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8594712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8594783Z if out == self.unknown_value: 2025-12-04T10:11:57.8594903Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8594999Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8595121Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8595499Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8595570Z graph_break [] 2025-12-04T10:11:57.8595653Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8595945Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8596016Z Traceback (most recent call last): 2025-12-04T10:11:57.8596313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8596379Z method(*args, **kwargs) 2025-12-04T10:11:57.8596673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8596768Z method(*args, **kwargs) 2025-12-04T10:11:57.8597063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8597120Z with policy(): 2025-12-04T10:11:57.8597417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8597485Z raise RuntimeError(msg) 2025-12-04T10:11:57.8598295Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8598299Z 2025-12-04T10:11:57.8598426Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8598946Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8598950Z 2025-12-04T10:11:57.8599104Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8599226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8599313Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8599711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8599833Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8599928Z graph_break [] 2025-12-04T10:11:57.8600052Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8600776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8600848Z if out == self.unknown_value: 2025-12-04T10:11:57.8600979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8601072Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8601193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8601533Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8601592Z graph_break [] 2025-12-04T10:11:57.8601710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8601801Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8601921Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8602300Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8602362Z graph_break [] 2025-12-04T10:11:57.8602849Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml - 2025-12-04T10:11:57.8602949Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8604232Z FAILED [0.4136s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8604274Z 2025-12-04T10:11:57.8604399Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8604915Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8604919Z 2025-12-04T10:11:57.8605073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8605180Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8605294Z ================== 1 failed, 57 deselected, 2 rerun in 11.88s ================== 2025-12-04T10:11:57.8605352Z Got exit code 1 2025-12-04T10:11:57.8605421Z Retrying single test... 2025-12-04T10:11:57.8605681Z W1204 10:01:32.812000 59421 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8606071Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml 2025-12-04T10:11:57.8606164Z ============================= test session starts ============================== 2025-12-04T10:11:57.8606376Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8606443Z cachedir: .pytest_cache 2025-12-04T10:11:57.8606783Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8606859Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8606925Z configfile: pytest.ini 2025-12-04T10:11:57.8607250Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8607419Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8607993Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8608062Z Running 1 items in this shard 2025-12-04T10:11:57.8608065Z 2025-12-04T10:11:57.8608793Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:33.859292993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8608797Z 2025-12-04T10:11:57.8609095Z [W1204 10:01:42.890559173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8609101Z 2025-12-04T10:11:57.8609399Z [W1204 10:01:42.890797517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8609402Z 2025-12-04T10:11:57.8609729Z [W1204 10:01:42.896597156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8609733Z 2025-12-04T10:11:57.8610027Z [W1204 10:01:42.897143645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8610030Z 2025-12-04T10:11:57.8610317Z [W1204 10:01:42.897327789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8610320Z 2025-12-04T10:11:57.8610607Z [W1204 10:01:42.902841603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8610612Z 2025-12-04T10:11:57.8610931Z [W1204 10:01:42.903391293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8610935Z 2025-12-04T10:11:57.8611225Z [W1204 10:01:42.903564586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8611228Z 2025-12-04T10:11:57.8611309Z ('RERUN', {'yellow': True}) [10.8872s] [100%] 2025-12-04T10:11:57.8612028Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:44.080884402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8612033Z 2025-12-04T10:11:57.8612326Z [W1204 10:01:44.081435572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8612333Z 2025-12-04T10:11:57.8612618Z [W1204 10:01:44.081578694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8612621Z 2025-12-04T10:11:57.8612915Z [W1204 10:01:44.084549815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8612918Z 2025-12-04T10:11:57.8613205Z [W1204 10:01:44.085119195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8613208Z 2025-12-04T10:11:57.8613531Z [W1204 10:01:44.085262037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8613534Z 2025-12-04T10:11:57.8613820Z [W1204 10:01:44.089829805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8613856Z 2025-12-04T10:11:57.8614147Z [W1204 10:01:44.090319683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8614151Z 2025-12-04T10:11:57.8614438Z [W1204 10:01:44.090460826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8614442Z 2025-12-04T10:11:57.8614523Z ('RERUN', {'yellow': True}) [0.4190s] [100%] 2025-12-04T10:11:57.8615243Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:44.494546687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8615247Z 2025-12-04T10:11:57.8615535Z [W1204 10:01:44.495114257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8615540Z 2025-12-04T10:11:57.8615829Z [W1204 10:01:44.495256889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8615832Z 2025-12-04T10:11:57.8616151Z [W1204 10:01:44.498258410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8616155Z 2025-12-04T10:11:57.8616449Z [W1204 10:01:44.498823130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8616452Z 2025-12-04T10:11:57.8616741Z [W1204 10:01:44.498960733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8616744Z 2025-12-04T10:11:57.8617185Z [W1204 10:01:44.503576872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8617190Z 2025-12-04T10:11:57.8617485Z [W1204 10:01:44.504035090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8617551Z 2025-12-04T10:11:57.8617846Z [W1204 10:01:44.504169742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8617849Z 2025-12-04T10:11:57.8617911Z FAILED [0.4135s] [100%] 2025-12-04T10:11:57.8617915Z 2025-12-04T10:11:57.8617998Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8618294Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8618369Z Traceback (most recent call last): 2025-12-04T10:11:57.8618686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8618750Z method(*args, **kwargs) 2025-12-04T10:11:57.8619042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8619116Z method(*args, **kwargs) 2025-12-04T10:11:57.8619416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8619475Z with policy(): 2025-12-04T10:11:57.8619771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8619836Z raise RuntimeError(msg) 2025-12-04T10:11:57.8620685Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8620689Z 2025-12-04T10:11:57.8620861Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8621379Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8621385Z 2025-12-04T10:11:57.8621540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8621666Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8621761Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8622109Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8622235Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8622296Z graph_break [] 2025-12-04T10:11:57.8622417Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8623242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8623313Z if out == self.unknown_value: 2025-12-04T10:11:57.8623602Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8623674Z Traceback (most recent call last): 2025-12-04T10:11:57.8623973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8624037Z method(*args, **kwargs) 2025-12-04T10:11:57.8624329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8624390Z method(*args, **kwargs) 2025-12-04T10:11:57.8624746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8624803Z with policy(): 2025-12-04T10:11:57.8625098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8625168Z raise RuntimeError(msg) 2025-12-04T10:11:57.8625974Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8625979Z 2025-12-04T10:11:57.8626104Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8626618Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8626625Z 2025-12-04T10:11:57.8626781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8626905Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8626997Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8627354Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8627480Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8627817Z graph_break [] 2025-12-04T10:11:57.8627953Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8628642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8628751Z if out == self.unknown_value: 2025-12-04T10:11:57.8628874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8628965Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8629092Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8629510Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8629573Z graph_break [] 2025-12-04T10:11:57.8629675Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8629966Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8630045Z Traceback (most recent call last): 2025-12-04T10:11:57.8630340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8630401Z method(*args, **kwargs) 2025-12-04T10:11:57.8630727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8630789Z method(*args, **kwargs) 2025-12-04T10:11:57.8631081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8631138Z with policy(): 2025-12-04T10:11:57.8631429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8631496Z raise RuntimeError(msg) 2025-12-04T10:11:57.8632302Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8632342Z 2025-12-04T10:11:57.8632470Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8632986Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8632990Z 2025-12-04T10:11:57.8633147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8633270Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8633369Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8633715Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8633839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8633897Z graph_break [] 2025-12-04T10:11:57.8634020Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8634704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8634773Z if out == self.unknown_value: 2025-12-04T10:11:57.8634929Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8635017Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8635143Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8635483Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8635577Z graph_break [] 2025-12-04T10:11:57.8635701Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8635788Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8635907Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8636244Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8636302Z graph_break [] 2025-12-04T10:11:57.8636794Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml - 2025-12-04T10:11:57.8636893Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8638217Z FAILED [0.4135s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8638222Z 2025-12-04T10:11:57.8638348Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8638865Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8638868Z 2025-12-04T10:11:57.8639022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8639160Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8639278Z ================== 1 failed, 57 deselected, 2 rerun in 11.74s ================== 2025-12-04T10:11:57.8639336Z Got exit code 1 2025-12-04T10:11:57.8639807Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8640105Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8640369Z W1204 10:01:51.166000 59607 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8640754Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml 2025-12-04T10:11:57.8640848Z ============================= test session starts ============================== 2025-12-04T10:11:57.8641059Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8641124Z cachedir: .pytest_cache 2025-12-04T10:11:57.8641430Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8641506Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8641571Z configfile: pytest.ini 2025-12-04T10:11:57.8641888Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8642054Z collecting ... collected 58 items / 45 deselected / 13 selected 2025-12-04T10:11:57.8642144Z stepcurrent: skipping 45 already run items. 2025-12-04T10:11:57.8642216Z Running 13 items in this shard 2025-12-04T10:11:57.8642220Z 2025-12-04T10:11:57.8642712Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9207s] [ 7%] 2025-12-04T10:11:57.8643237Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5308s] [ 7%] 2025-12-04T10:11:57.8643677Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5265s] [ 7%] 2025-12-04T10:11:57.8643680Z 2025-12-04T10:11:57.8643761Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8644067Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8644142Z Traceback (most recent call last): 2025-12-04T10:11:57.8644449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8644516Z method(*args, **kwargs) 2025-12-04T10:11:57.8644846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8644913Z method(*args, **kwargs) 2025-12-04T10:11:57.8645205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8645262Z with policy(): 2025-12-04T10:11:57.8645560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8645626Z raise RuntimeError(msg) 2025-12-04T10:11:57.8646428Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8646468Z 2025-12-04T10:11:57.8646594Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8647115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8647119Z 2025-12-04T10:11:57.8647274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8647399Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8647492Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8648035Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8648167Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8648224Z graph_break [] 2025-12-04T10:11:57.8648513Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8648590Z Traceback (most recent call last): 2025-12-04T10:11:57.8648887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8648947Z method(*args, **kwargs) 2025-12-04T10:11:57.8649282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8649345Z method(*args, **kwargs) 2025-12-04T10:11:57.8649640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8649697Z with policy(): 2025-12-04T10:11:57.8650038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8650107Z raise RuntimeError(msg) 2025-12-04T10:11:57.8650912Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8650917Z 2025-12-04T10:11:57.8651039Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8651551Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8651555Z 2025-12-04T10:11:57.8651707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8651835Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8651924Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8652516Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8652641Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8652698Z graph_break [] 2025-12-04T10:11:57.8652827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8652914Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8653036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8653569Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8653673Z graph_break [] 2025-12-04T10:11:57.8653761Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8654048Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8654119Z Traceback (most recent call last): 2025-12-04T10:11:57.8654418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8654482Z method(*args, **kwargs) 2025-12-04T10:11:57.8654777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8654839Z method(*args, **kwargs) 2025-12-04T10:11:57.8655126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8655192Z with policy(): 2025-12-04T10:11:57.8655485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8655555Z raise RuntimeError(msg) 2025-12-04T10:11:57.8656363Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8656403Z 2025-12-04T10:11:57.8656528Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8657045Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8657083Z 2025-12-04T10:11:57.8657237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8657367Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8657454Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8657993Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8658117Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8658174Z graph_break [] 2025-12-04T10:11:57.8658299Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8658385Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8658504Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8659073Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8659131Z graph_break [] 2025-12-04T10:11:57.8659256Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8659340Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8659457Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8659999Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8660063Z graph_break [] 2025-12-04T10:11:57.8660552Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml - 2025-12-04T10:11:57.8660693Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8661977Z FAILED [0.5265s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8661985Z 2025-12-04T10:11:57.8662106Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8662620Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8662626Z 2025-12-04T10:11:57.8662784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8662888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8663004Z ================== 1 failed, 45 deselected, 2 rerun in 3.00s =================== 2025-12-04T10:11:57.8663062Z Got exit code 1 2025-12-04T10:11:57.8663125Z Retrying single test... 2025-12-04T10:11:57.8663421Z W1204 10:02:00.782000 59789 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8663803Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml 2025-12-04T10:11:57.8663895Z ============================= test session starts ============================== 2025-12-04T10:11:57.8664138Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8664202Z cachedir: .pytest_cache 2025-12-04T10:11:57.8664514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8664587Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8664651Z configfile: pytest.ini 2025-12-04T10:11:57.8664966Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8665095Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8665662Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8665735Z Running 1 items in this shard 2025-12-04T10:11:57.8665740Z 2025-12-04T10:11:57.8666500Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:02.358959509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8666505Z 2025-12-04T10:11:57.8666805Z [W1204 10:02:11.221429124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8666808Z 2025-12-04T10:11:57.8667100Z [W1204 10:02:11.221698429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8667103Z 2025-12-04T10:11:57.8667396Z [W1204 10:02:11.227782462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8667400Z 2025-12-04T10:11:57.8667722Z [W1204 10:02:11.228396562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8667725Z 2025-12-04T10:11:57.8668021Z [W1204 10:02:11.228585156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8668026Z 2025-12-04T10:11:57.8668320Z [W1204 10:02:11.234108710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8668323Z 2025-12-04T10:11:57.8668611Z [W1204 10:02:11.234659119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8668615Z 2025-12-04T10:11:57.8668901Z [W1204 10:02:11.234828582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8668905Z 2025-12-04T10:11:57.8668983Z ('RERUN', {'yellow': True}) [10.7946s] [100%] 2025-12-04T10:11:57.8669717Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:12.028548844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8669721Z 2025-12-04T10:11:57.8670007Z [W1204 10:02:12.029077823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8670010Z 2025-12-04T10:11:57.8670333Z [W1204 10:02:12.029220096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8670337Z 2025-12-04T10:11:57.8670623Z [W1204 10:02:12.032160845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8670628Z 2025-12-04T10:11:57.8670947Z [W1204 10:02:12.032617723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8670952Z 2025-12-04T10:11:57.8671241Z [W1204 10:02:12.032754675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8671244Z 2025-12-04T10:11:57.8671533Z [W1204 10:02:12.037228941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8671536Z 2025-12-04T10:11:57.8671824Z [W1204 10:02:12.037677568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8671827Z 2025-12-04T10:11:57.8672118Z [W1204 10:02:12.037815561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8672121Z 2025-12-04T10:11:57.8672198Z ('RERUN', {'yellow': True}) [0.4937s] [100%] 2025-12-04T10:11:57.8672955Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:12.521523918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8672959Z 2025-12-04T10:11:57.8673252Z [W1204 10:02:12.522047657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8673255Z 2025-12-04T10:11:57.8673546Z [W1204 10:02:12.522188479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8673549Z 2025-12-04T10:11:57.8673837Z [W1204 10:02:12.525072228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8673840Z 2025-12-04T10:11:57.8674128Z [W1204 10:02:12.525517825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8674165Z 2025-12-04T10:11:57.8674455Z [W1204 10:02:12.525656498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8674458Z 2025-12-04T10:11:57.8674742Z [W1204 10:02:12.530153474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8674745Z 2025-12-04T10:11:57.8675037Z [W1204 10:02:12.530610562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8675044Z 2025-12-04T10:11:57.8675338Z [W1204 10:02:12.530748014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8675341Z 2025-12-04T10:11:57.8675401Z FAILED [0.4904s] [100%] 2025-12-04T10:11:57.8675406Z 2025-12-04T10:11:57.8675494Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8675786Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8675862Z Traceback (most recent call last): 2025-12-04T10:11:57.8676164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8676227Z method(*args, **kwargs) 2025-12-04T10:11:57.8676523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8676623Z method(*args, **kwargs) 2025-12-04T10:11:57.8676916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8676975Z with policy(): 2025-12-04T10:11:57.8677269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8677391Z raise RuntimeError(msg) 2025-12-04T10:11:57.8678190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8678194Z 2025-12-04T10:11:57.8678323Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8678853Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8678857Z 2025-12-04T10:11:57.8679017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8679148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8679240Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8679839Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8680006Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8680065Z graph_break [] 2025-12-04T10:11:57.8680189Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8680883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8680957Z if out == self.unknown_value: 2025-12-04T10:11:57.8681285Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8681357Z Traceback (most recent call last): 2025-12-04T10:11:57.8681659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8681720Z method(*args, **kwargs) 2025-12-04T10:11:57.8682008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8682076Z method(*args, **kwargs) 2025-12-04T10:11:57.8682364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8682425Z with policy(): 2025-12-04T10:11:57.8682716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8682783Z raise RuntimeError(msg) 2025-12-04T10:11:57.8683593Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8683597Z 2025-12-04T10:11:57.8683720Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8684279Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8684284Z 2025-12-04T10:11:57.8684440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8684567Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8684692Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8685241Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8685369Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8685425Z graph_break [] 2025-12-04T10:11:57.8685545Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8686235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8686305Z if out == self.unknown_value: 2025-12-04T10:11:57.8686427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8686519Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8686639Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8687213Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8687272Z graph_break [] 2025-12-04T10:11:57.8687359Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8687649Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8687721Z Traceback (most recent call last): 2025-12-04T10:11:57.8688020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8688090Z method(*args, **kwargs) 2025-12-04T10:11:57.8688418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8688496Z method(*args, **kwargs) 2025-12-04T10:11:57.8688787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8688849Z with policy(): 2025-12-04T10:11:57.8689140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8689206Z raise RuntimeError(msg) 2025-12-04T10:11:57.8690020Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8690027Z 2025-12-04T10:11:57.8690149Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8690667Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8690670Z 2025-12-04T10:11:57.8690822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8690947Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8691050Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8691844Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8691985Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8692082Z graph_break [] 2025-12-04T10:11:57.8692210Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8692901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8692969Z if out == self.unknown_value: 2025-12-04T10:11:57.8693095Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8693186Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8693308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8693847Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8693907Z graph_break [] 2025-12-04T10:11:57.8694031Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8694164Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8694288Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8694828Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8694886Z graph_break [] 2025-12-04T10:11:57.8695376Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml - 2025-12-04T10:11:57.8695476Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8696800Z FAILED [0.4904s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8696808Z 2025-12-04T10:11:57.8696935Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8697452Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8697457Z 2025-12-04T10:11:57.8697621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8697727Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8697843Z ================== 1 failed, 57 deselected, 2 rerun in 11.80s ================== 2025-12-04T10:11:57.8697902Z Got exit code 1 2025-12-04T10:11:57.8697965Z Retrying single test... 2025-12-04T10:11:57.8698243Z W1204 10:02:19.180000 59975 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8698623Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml 2025-12-04T10:11:57.8698750Z ============================= test session starts ============================== 2025-12-04T10:11:57.8698962Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8699028Z cachedir: .pytest_cache 2025-12-04T10:11:57.8699332Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8699443Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8699508Z configfile: pytest.ini 2025-12-04T10:11:57.8699843Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8699975Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8700544Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8700616Z Running 1 items in this shard 2025-12-04T10:11:57.8700620Z 2025-12-04T10:11:57.8701344Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:20.767520727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8701356Z 2025-12-04T10:11:57.8701691Z [W1204 10:02:29.889076346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8701695Z 2025-12-04T10:11:57.8701989Z [W1204 10:02:29.889348501 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8701992Z 2025-12-04T10:11:57.8702284Z [W1204 10:02:29.895409844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8702289Z 2025-12-04T10:11:57.8702574Z [W1204 10:02:29.896010345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8702577Z 2025-12-04T10:11:57.8702866Z [W1204 10:02:29.896190538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8702905Z 2025-12-04T10:11:57.8703198Z [W1204 10:02:29.901672291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8703201Z 2025-12-04T10:11:57.8703492Z [W1204 10:02:29.902209240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8703495Z 2025-12-04T10:11:57.8703780Z [W1204 10:02:29.902366433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8703783Z 2025-12-04T10:11:57.8703872Z ('RERUN', {'yellow': True}) [11.0649s] [100%] 2025-12-04T10:11:57.8704590Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:30.704441237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8704597Z 2025-12-04T10:11:57.8704885Z [W1204 10:02:30.704974946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8704888Z 2025-12-04T10:11:57.8705179Z [W1204 10:02:30.705111959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8705182Z 2025-12-04T10:11:57.8705470Z [W1204 10:02:30.707925417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8705473Z 2025-12-04T10:11:57.8705817Z [W1204 10:02:30.708378405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8705821Z 2025-12-04T10:11:57.8706108Z [W1204 10:02:30.708513837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8706147Z 2025-12-04T10:11:57.8706439Z [W1204 10:02:30.713026294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8706444Z 2025-12-04T10:11:57.8706730Z [W1204 10:02:30.713477582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8706734Z 2025-12-04T10:11:57.8707023Z [W1204 10:02:30.713615894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8707026Z 2025-12-04T10:11:57.8707106Z ('RERUN', {'yellow': True}) [0.4995s] [100%] 2025-12-04T10:11:57.8707826Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:31.200790172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8707836Z 2025-12-04T10:11:57.8708163Z [W1204 10:02:31.201334241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8708167Z 2025-12-04T10:11:57.8708453Z [W1204 10:02:31.201473494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8708456Z 2025-12-04T10:11:57.8708747Z [W1204 10:02:31.204290682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8708750Z 2025-12-04T10:11:57.8709040Z [W1204 10:02:31.204743759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8709043Z 2025-12-04T10:11:57.8709337Z [W1204 10:02:31.204883702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8709375Z 2025-12-04T10:11:57.8709663Z [W1204 10:02:31.209247446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8709668Z 2025-12-04T10:11:57.8709956Z [W1204 10:02:31.209693114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8709960Z 2025-12-04T10:11:57.8710244Z [W1204 10:02:31.209827976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8710247Z 2025-12-04T10:11:57.8710309Z FAILED [0.4949s] [100%] 2025-12-04T10:11:57.8710314Z 2025-12-04T10:11:57.8710395Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8710685Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8710765Z Traceback (most recent call last): 2025-12-04T10:11:57.8711077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8711144Z method(*args, **kwargs) 2025-12-04T10:11:57.8711442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8711504Z method(*args, **kwargs) 2025-12-04T10:11:57.8711802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8711861Z with policy(): 2025-12-04T10:11:57.8712191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8712264Z raise RuntimeError(msg) 2025-12-04T10:11:57.8713058Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8713099Z 2025-12-04T10:11:57.8713232Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8713758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8713762Z 2025-12-04T10:11:57.8713923Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8714053Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8714152Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8714705Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8714839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8714932Z graph_break [] 2025-12-04T10:11:57.8715066Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8715756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8715834Z if out == self.unknown_value: 2025-12-04T10:11:57.8716128Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8716203Z Traceback (most recent call last): 2025-12-04T10:11:57.8716510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8716608Z method(*args, **kwargs) 2025-12-04T10:11:57.8716904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8716965Z method(*args, **kwargs) 2025-12-04T10:11:57.8717414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8717479Z with policy(): 2025-12-04T10:11:57.8717777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8717842Z raise RuntimeError(msg) 2025-12-04T10:11:57.8718654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8718661Z 2025-12-04T10:11:57.8718790Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8719309Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8719312Z 2025-12-04T10:11:57.8719474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8719606Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8719771Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8720405Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8720593Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8720652Z graph_break [] 2025-12-04T10:11:57.8720782Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8721467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8721536Z if out == self.unknown_value: 2025-12-04T10:11:57.8721664Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8721757Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8721880Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8722425Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8722485Z graph_break [] 2025-12-04T10:11:57.8722632Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8722929Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8723001Z Traceback (most recent call last): 2025-12-04T10:11:57.8723305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8723369Z method(*args, **kwargs) 2025-12-04T10:11:57.8723662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8723724Z method(*args, **kwargs) 2025-12-04T10:11:57.8724016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8724125Z with policy(): 2025-12-04T10:11:57.8724421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8724487Z raise RuntimeError(msg) 2025-12-04T10:11:57.8725297Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8725301Z 2025-12-04T10:11:57.8725426Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8725947Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8725954Z 2025-12-04T10:11:57.8726111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8726240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8726330Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8726871Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8727033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8727091Z graph_break [] 2025-12-04T10:11:57.8727218Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8727901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8728005Z if out == self.unknown_value: 2025-12-04T10:11:57.8728131Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8728220Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8728347Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8728888Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8728945Z graph_break [] 2025-12-04T10:11:57.8729068Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8729159Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8729279Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8729851Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8729910Z graph_break [] 2025-12-04T10:11:57.8730403Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml - 2025-12-04T10:11:57.8730512Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8731797Z FAILED [0.4949s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8731854Z 2025-12-04T10:11:57.8731978Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8732492Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8732499Z 2025-12-04T10:11:57.8732656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8732759Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8732877Z ================== 1 failed, 57 deselected, 2 rerun in 12.08s ================== 2025-12-04T10:11:57.8732935Z Got exit code 1 2025-12-04T10:11:57.8733406Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8733652Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8733916Z W1204 10:02:37.864000 60162 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8734305Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml 2025-12-04T10:11:57.8734434Z ============================= test session starts ============================== 2025-12-04T10:11:57.8734645Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8734713Z cachedir: .pytest_cache 2025-12-04T10:11:57.8735135Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8735221Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8735286Z configfile: pytest.ini 2025-12-04T10:11:57.8735603Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8735735Z collecting ... collected 58 items / 46 deselected / 12 selected 2025-12-04T10:11:57.8735822Z stepcurrent: skipping 46 already run items. 2025-12-04T10:11:57.8735891Z Running 12 items in this shard 2025-12-04T10:11:57.8735895Z 2025-12-04T10:11:57.8736406Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8851s] [ 8%] 2025-12-04T10:11:57.8736894Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5062s] [ 8%] 2025-12-04T10:11:57.8737387Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5022s] [ 8%] 2025-12-04T10:11:57.8737391Z 2025-12-04T10:11:57.8737472Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8737773Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8737852Z Traceback (most recent call last): 2025-12-04T10:11:57.8738161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8738226Z method(*args, **kwargs) 2025-12-04T10:11:57.8738522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8738620Z method(*args, **kwargs) 2025-12-04T10:11:57.8738911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8738971Z with policy(): 2025-12-04T10:11:57.8739269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8739334Z raise RuntimeError(msg) 2025-12-04T10:11:57.8740140Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8740148Z 2025-12-04T10:11:57.8740269Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8740792Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8740797Z 2025-12-04T10:11:57.8740964Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8741087Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8741186Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8741537Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8741696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8741759Z graph_break [] 2025-12-04T10:11:57.8742051Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8742157Z Traceback (most recent call last): 2025-12-04T10:11:57.8742458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8742521Z method(*args, **kwargs) 2025-12-04T10:11:57.8742815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8742876Z method(*args, **kwargs) 2025-12-04T10:11:57.8743163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8743223Z with policy(): 2025-12-04T10:11:57.8743520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8743583Z raise RuntimeError(msg) 2025-12-04T10:11:57.8744409Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8744417Z 2025-12-04T10:11:57.8744572Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8745098Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8745102Z 2025-12-04T10:11:57.8745257Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8745391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8745488Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8745833Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8746000Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8746060Z graph_break [] 2025-12-04T10:11:57.8746184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8746272Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8746390Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8746735Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8746795Z graph_break [] 2025-12-04T10:11:57.8746878Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8747170Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8747244Z Traceback (most recent call last): 2025-12-04T10:11:57.8747543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8747606Z method(*args, **kwargs) 2025-12-04T10:11:57.8747898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8747966Z method(*args, **kwargs) 2025-12-04T10:11:57.8748257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8748316Z with policy(): 2025-12-04T10:11:57.8748645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8748711Z raise RuntimeError(msg) 2025-12-04T10:11:57.8749539Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8749579Z 2025-12-04T10:11:57.8749702Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8750223Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8750227Z 2025-12-04T10:11:57.8750387Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8750510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8750603Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8750943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8751070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8751129Z graph_break [] 2025-12-04T10:11:57.8751289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8751382Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8751500Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8751840Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8751902Z graph_break [] 2025-12-04T10:11:57.8752022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8752113Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8752247Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8752622Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8752684Z graph_break [] 2025-12-04T10:11:57.8753166Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml - 2025-12-04T10:11:57.8753266Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8754564Z FAILED [0.5022s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8754571Z 2025-12-04T10:11:57.8754692Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8755213Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8755217Z 2025-12-04T10:11:57.8755367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8755509Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8755627Z ================== 1 failed, 46 deselected, 2 rerun in 2.92s =================== 2025-12-04T10:11:57.8755683Z Got exit code 1 2025-12-04T10:11:57.8755752Z Retrying single test... 2025-12-04T10:11:57.8756010Z W1204 10:02:47.569000 60351 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8756430Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml 2025-12-04T10:11:57.8756525Z ============================= test session starts ============================== 2025-12-04T10:11:57.8756730Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8756812Z cachedir: .pytest_cache 2025-12-04T10:11:57.8757121Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8757202Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8757267Z configfile: pytest.ini 2025-12-04T10:11:57.8757580Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8757713Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8758335Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8758406Z Running 1 items in this shard 2025-12-04T10:11:57.8758413Z 2025-12-04T10:11:57.8759143Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:02:48.651370774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8759147Z 2025-12-04T10:11:57.8759442Z [W1204 10:02:57.673662749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8759449Z 2025-12-04T10:11:57.8759739Z [W1204 10:02:57.673919823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8759777Z 2025-12-04T10:11:57.8760119Z [W1204 10:02:57.679656281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8760123Z 2025-12-04T10:11:57.8760416Z [W1204 10:02:57.680278892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8760420Z 2025-12-04T10:11:57.8760707Z [W1204 10:02:57.680466835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8760712Z 2025-12-04T10:11:57.8761000Z [W1204 10:02:57.685859637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8761003Z 2025-12-04T10:11:57.8761289Z [W1204 10:02:57.686369796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8761296Z 2025-12-04T10:11:57.8761582Z [W1204 10:02:57.686533359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8761585Z 2025-12-04T10:11:57.8761665Z ('RERUN', {'yellow': True}) [10.9071s] [100%] 2025-12-04T10:11:57.8762392Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:02:58.894702226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8762441Z 2025-12-04T10:11:57.8762728Z [W1204 10:02:58.895228435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8762732Z 2025-12-04T10:11:57.8763030Z [W1204 10:02:58.895371568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8763068Z 2025-12-04T10:11:57.8763358Z [W1204 10:02:58.898309378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8763362Z 2025-12-04T10:11:57.8763647Z [W1204 10:02:58.898870827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8763650Z 2025-12-04T10:11:57.8763942Z [W1204 10:02:58.899012990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8763945Z 2025-12-04T10:11:57.8764234Z [W1204 10:02:58.903556167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8764237Z 2025-12-04T10:11:57.8764529Z [W1204 10:02:58.904016405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8764535Z 2025-12-04T10:11:57.8764819Z [W1204 10:02:58.904152868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8764856Z 2025-12-04T10:11:57.8764939Z ('RERUN', {'yellow': True}) [0.4500s] [100%] 2025-12-04T10:11:57.8765662Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:02:59.341565586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8765666Z 2025-12-04T10:11:57.8765957Z [W1204 10:02:59.342082994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8765965Z 2025-12-04T10:11:57.8766256Z [W1204 10:02:59.342218857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8766293Z 2025-12-04T10:11:57.8766582Z [W1204 10:02:59.345116526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8766586Z 2025-12-04T10:11:57.8766874Z [W1204 10:02:59.345656735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8766877Z 2025-12-04T10:11:57.8767165Z [W1204 10:02:59.345794088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8767168Z 2025-12-04T10:11:57.8767460Z [W1204 10:02:59.350272464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8767464Z 2025-12-04T10:11:57.8767756Z [W1204 10:02:59.350725642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8767762Z 2025-12-04T10:11:57.8768052Z [W1204 10:02:59.350863504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8768057Z 2025-12-04T10:11:57.8768118Z FAILED [0.4474s] [100%] 2025-12-04T10:11:57.8768122Z 2025-12-04T10:11:57.8768206Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8768505Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8768578Z Traceback (most recent call last): 2025-12-04T10:11:57.8768924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8768991Z method(*args, **kwargs) 2025-12-04T10:11:57.8769281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8769383Z method(*args, **kwargs) 2025-12-04T10:11:57.8769671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8769731Z with policy(): 2025-12-04T10:11:57.8770028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8770093Z raise RuntimeError(msg) 2025-12-04T10:11:57.8770898Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8770902Z 2025-12-04T10:11:57.8771027Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8771551Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8771557Z 2025-12-04T10:11:57.8771748Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8771889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8771985Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8772337Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8772465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8772524Z graph_break [] 2025-12-04T10:11:57.8772647Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8773348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8773456Z if out == self.unknown_value: 2025-12-04T10:11:57.8773754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8773827Z Traceback (most recent call last): 2025-12-04T10:11:57.8774122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8774190Z method(*args, **kwargs) 2025-12-04T10:11:57.8774482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8778581Z method(*args, **kwargs) 2025-12-04T10:11:57.8778961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8779043Z with policy(): 2025-12-04T10:11:57.8779371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8779450Z raise RuntimeError(msg) 2025-12-04T10:11:57.8780294Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8780300Z 2025-12-04T10:11:57.8780522Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8781066Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8781112Z 2025-12-04T10:11:57.8781283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8781427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8781529Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8781888Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8782024Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8782085Z graph_break [] 2025-12-04T10:11:57.8782223Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8782937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8783016Z if out == self.unknown_value: 2025-12-04T10:11:57.8783146Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8783282Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8783416Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8783775Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8783836Z graph_break [] 2025-12-04T10:11:57.8783927Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8784231Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8784309Z Traceback (most recent call last): 2025-12-04T10:11:57.8784625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8784730Z method(*args, **kwargs) 2025-12-04T10:11:57.8785032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8785095Z method(*args, **kwargs) 2025-12-04T10:11:57.8785386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8785449Z with policy(): 2025-12-04T10:11:57.8785752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8785821Z raise RuntimeError(msg) 2025-12-04T10:11:57.8786654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8786662Z 2025-12-04T10:11:57.8786805Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8787337Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8787341Z 2025-12-04T10:11:57.8787502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8787670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8787768Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8788118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8788301Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8788359Z graph_break [] 2025-12-04T10:11:57.8788486Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8789180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8789249Z if out == self.unknown_value: 2025-12-04T10:11:57.8789374Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8789465Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8789588Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8789929Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8789988Z graph_break [] 2025-12-04T10:11:57.8790112Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8790233Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8790355Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8790698Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8790756Z graph_break [] 2025-12-04T10:11:57.8791249Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml - 2025-12-04T10:11:57.8791353Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8792665Z FAILED [0.4474s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8792706Z 2025-12-04T10:11:57.8792834Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8793357Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8793363Z 2025-12-04T10:11:57.8793519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8793636Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8793760Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ================== 2025-12-04T10:11:57.8793817Z Got exit code 1 2025-12-04T10:11:57.8793883Z Retrying single test... 2025-12-04T10:11:57.8794155Z W1204 10:03:05.926000 60544 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8794539Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml 2025-12-04T10:11:57.8794636Z ============================= test session starts ============================== 2025-12-04T10:11:57.8794880Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8794947Z cachedir: .pytest_cache 2025-12-04T10:11:57.8795261Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8795373Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8795436Z configfile: pytest.ini 2025-12-04T10:11:57.8795756Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8795886Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8796458Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8796529Z Running 1 items in this shard 2025-12-04T10:11:57.8796534Z 2025-12-04T10:11:57.8797272Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:03:07.018935365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8797279Z 2025-12-04T10:11:57.8797577Z [W1204 10:03:16.103175713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8797615Z 2025-12-04T10:11:57.8797905Z [W1204 10:03:16.103415017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8797912Z 2025-12-04T10:11:57.8798197Z [W1204 10:03:16.109327078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8798201Z 2025-12-04T10:11:57.8798488Z [W1204 10:03:16.109928018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8798491Z 2025-12-04T10:11:57.8798777Z [W1204 10:03:16.110130522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8798815Z 2025-12-04T10:11:57.8799104Z [W1204 10:03:16.115688946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8799107Z 2025-12-04T10:11:57.8799395Z [W1204 10:03:16.116220806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8799398Z 2025-12-04T10:11:57.8799685Z [W1204 10:03:16.116392618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8799688Z 2025-12-04T10:11:57.8799770Z ('RERUN', {'yellow': True}) [10.9829s] [100%] 2025-12-04T10:11:57.8800575Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:03:17.327093789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8800582Z 2025-12-04T10:11:57.8800880Z [W1204 10:03:17.327627138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8800884Z 2025-12-04T10:11:57.8801172Z [W1204 10:03:17.327770071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8801176Z 2025-12-04T10:11:57.8801460Z [W1204 10:03:17.330773202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8801466Z 2025-12-04T10:11:57.8801789Z [W1204 10:03:17.331343092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8801793Z 2025-12-04T10:11:57.8802079Z [W1204 10:03:17.331481404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8802117Z 2025-12-04T10:11:57.8802425Z [W1204 10:03:17.336092403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8802429Z 2025-12-04T10:11:57.8802727Z [W1204 10:03:17.336570891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8802730Z 2025-12-04T10:11:57.8803024Z [W1204 10:03:17.336711313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8803028Z 2025-12-04T10:11:57.8803107Z ('RERUN', {'yellow': True}) [0.4521s] [100%] 2025-12-04T10:11:57.8803851Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:03:17.778970486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8803858Z 2025-12-04T10:11:57.8804150Z [W1204 10:03:17.779491975 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8804153Z 2025-12-04T10:11:57.8804475Z [W1204 10:03:17.779637507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8804482Z 2025-12-04T10:11:57.8804772Z [W1204 10:03:17.782599498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8804776Z 2025-12-04T10:11:57.8805067Z [W1204 10:03:17.783158828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8805070Z 2025-12-04T10:11:57.8805362Z [W1204 10:03:17.783297440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8805411Z 2025-12-04T10:11:57.8805706Z [W1204 10:03:17.787808107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8805709Z 2025-12-04T10:11:57.8806000Z [W1204 10:03:17.788263525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8806003Z 2025-12-04T10:11:57.8806290Z [W1204 10:03:17.788408387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8806293Z 2025-12-04T10:11:57.8806356Z FAILED [0.4504s] [100%] 2025-12-04T10:11:57.8806360Z 2025-12-04T10:11:57.8806445Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8806744Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8806833Z Traceback (most recent call last): 2025-12-04T10:11:57.8807155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8807221Z method(*args, **kwargs) 2025-12-04T10:11:57.8807516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8807583Z method(*args, **kwargs) 2025-12-04T10:11:57.8807878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8807938Z with policy(): 2025-12-04T10:11:57.8808269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8808338Z raise RuntimeError(msg) 2025-12-04T10:11:57.8809148Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8809187Z 2025-12-04T10:11:57.8809323Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8809851Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8809854Z 2025-12-04T10:11:57.8810024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8810156Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8810250Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8810607Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8810738Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8810800Z graph_break [] 2025-12-04T10:11:57.8810957Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8811656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8811728Z if out == self.unknown_value: 2025-12-04T10:11:57.8812024Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8812097Z Traceback (most recent call last): 2025-12-04T10:11:57.8812397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8812460Z method(*args, **kwargs) 2025-12-04T10:11:57.8812788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8812851Z method(*args, **kwargs) 2025-12-04T10:11:57.8813139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8813200Z with policy(): 2025-12-04T10:11:57.8813492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8813560Z raise RuntimeError(msg) 2025-12-04T10:11:57.8814383Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8814389Z 2025-12-04T10:11:57.8814515Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8815046Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8815050Z 2025-12-04T10:11:57.8815208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8815335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8815436Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8815839Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8815970Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8816061Z graph_break [] 2025-12-04T10:11:57.8816190Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8816884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8816952Z if out == self.unknown_value: 2025-12-04T10:11:57.8817260Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8817353Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8817479Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8817827Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8817885Z graph_break [] 2025-12-04T10:11:57.8817983Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8818281Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.8818424Z Traceback (most recent call last): 2025-12-04T10:11:57.8818733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8818795Z method(*args, **kwargs) 2025-12-04T10:11:57.8819089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8819152Z method(*args, **kwargs) 2025-12-04T10:11:57.8819444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8819506Z with policy(): 2025-12-04T10:11:57.8819795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8819921Z raise RuntimeError(msg) 2025-12-04T10:11:57.8820746Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8820750Z 2025-12-04T10:11:57.8820875Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8821411Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8821416Z 2025-12-04T10:11:57.8821572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8821701Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8821792Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8822133Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8822261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8822318Z graph_break [] 2025-12-04T10:11:57.8822443Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8823181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8823251Z if out == self.unknown_value: 2025-12-04T10:11:57.8823377Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8823510Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8823630Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8824464Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8824523Z graph_break [] 2025-12-04T10:11:57.8824647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8824732Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8824854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8825193Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8825249Z graph_break [] 2025-12-04T10:11:57.8825745Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml - 2025-12-04T10:11:57.8825886Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8827185Z FAILED [0.4504s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8827194Z 2025-12-04T10:11:57.8827316Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8827837Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8827875Z 2025-12-04T10:11:57.8828036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8828139Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8828254Z ================== 1 failed, 57 deselected, 2 rerun in 11.91s ================== 2025-12-04T10:11:57.8828313Z Got exit code 1 2025-12-04T10:11:57.8828788Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.8829036Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8829298Z W1204 10:03:24.328000 60737 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8829688Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml 2025-12-04T10:11:57.8829786Z ============================= test session starts ============================== 2025-12-04T10:11:57.8829998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8830066Z cachedir: .pytest_cache 2025-12-04T10:11:57.8830380Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8830458Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8830563Z configfile: pytest.ini 2025-12-04T10:11:57.8830880Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8831012Z collecting ... collected 58 items / 47 deselected / 11 selected 2025-12-04T10:11:57.8831132Z stepcurrent: skipping 47 already run items. 2025-12-04T10:11:57.8831202Z Running 11 items in this shard 2025-12-04T10:11:57.8831206Z 2025-12-04T10:11:57.8831704Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8700s] [ 9%] 2025-12-04T10:11:57.8832188Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4602s] [ 9%] 2025-12-04T10:11:57.8832632Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4700s] [ 9%] 2025-12-04T10:11:57.8832637Z 2025-12-04T10:11:57.8832717Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8833018Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8833093Z Traceback (most recent call last): 2025-12-04T10:11:57.8833436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8833515Z method(*args, **kwargs) 2025-12-04T10:11:57.8833813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8833876Z method(*args, **kwargs) 2025-12-04T10:11:57.8834171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8834230Z with policy(): 2025-12-04T10:11:57.8834529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8834595Z raise RuntimeError(msg) 2025-12-04T10:11:57.8835425Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8835430Z 2025-12-04T10:11:57.8835556Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8836076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8836081Z 2025-12-04T10:11:57.8836241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8836366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8836463Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8836817Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8836944Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8837003Z graph_break [] 2025-12-04T10:11:57.8837292Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8837363Z Traceback (most recent call last): 2025-12-04T10:11:57.8837700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8837763Z method(*args, **kwargs) 2025-12-04T10:11:57.8838060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8838122Z method(*args, **kwargs) 2025-12-04T10:11:57.8838446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8838508Z with policy(): 2025-12-04T10:11:57.8838803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8838867Z raise RuntimeError(msg) 2025-12-04T10:11:57.8839678Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8839682Z 2025-12-04T10:11:57.8839802Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8840375Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8840382Z 2025-12-04T10:11:57.8840539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8840705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8840797Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8841140Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8841268Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8841326Z graph_break [] 2025-12-04T10:11:57.8841450Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8841540Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8841661Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8842058Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8842117Z graph_break [] 2025-12-04T10:11:57.8842198Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8842489Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8842560Z Traceback (most recent call last): 2025-12-04T10:11:57.8842861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8842941Z method(*args, **kwargs) 2025-12-04T10:11:57.8843234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8843299Z method(*args, **kwargs) 2025-12-04T10:11:57.8843591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8843649Z with policy(): 2025-12-04T10:11:57.8843944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8844013Z raise RuntimeError(msg) 2025-12-04T10:11:57.8844862Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8844867Z 2025-12-04T10:11:57.8844989Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8845506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8845548Z 2025-12-04T10:11:57.8845704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8845828Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8845918Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8846261Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8846382Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8846443Z graph_break [] 2025-12-04T10:11:57.8846564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8846655Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8846774Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8847114Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8847207Z graph_break [] 2025-12-04T10:11:57.8847329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8847417Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8847538Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8847877Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8847940Z graph_break [] 2025-12-04T10:11:57.8848429Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml - 2025-12-04T10:11:57.8848563Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8849844Z FAILED [0.4700s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8849849Z 2025-12-04T10:11:57.8849970Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8850486Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8850491Z 2025-12-04T10:11:57.8850646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8850763Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8850881Z ================== 1 failed, 47 deselected, 2 rerun in 2.82s =================== 2025-12-04T10:11:57.8850943Z Got exit code 1 2025-12-04T10:11:57.8851016Z Retrying single test... 2025-12-04T10:11:57.8851280Z W1204 10:03:33.984000 60918 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8851786Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml 2025-12-04T10:11:57.8851890Z ============================= test session starts ============================== 2025-12-04T10:11:57.8852097Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8852199Z cachedir: .pytest_cache 2025-12-04T10:11:57.8852506Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8852582Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8852655Z configfile: pytest.ini 2025-12-04T10:11:57.8852978Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8853110Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8853683Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8853753Z Running 1 items in this shard 2025-12-04T10:11:57.8853757Z 2025-12-04T10:11:57.8854487Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:35.036679337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8854494Z 2025-12-04T10:11:57.8854831Z [W1204 10:03:44.962890384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8854835Z 2025-12-04T10:11:57.8855128Z [W1204 10:03:44.963151158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8855132Z 2025-12-04T10:11:57.8855422Z [W1204 10:03:44.969067388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8855425Z 2025-12-04T10:11:57.8855714Z [W1204 10:03:44.969650748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8855753Z 2025-12-04T10:11:57.8856042Z [W1204 10:03:44.969817831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8856045Z 2025-12-04T10:11:57.8856335Z [W1204 10:03:44.975333125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8856339Z 2025-12-04T10:11:57.8856625Z [W1204 10:03:44.975913185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8856629Z 2025-12-04T10:11:57.8856918Z [W1204 10:03:44.976082158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8856921Z 2025-12-04T10:11:57.8857004Z ('RERUN', {'yellow': True}) [10.7856s] [100%] 2025-12-04T10:11:57.8857725Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:45.146966674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8857731Z 2025-12-04T10:11:57.8858025Z [W1204 10:03:45.147516554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8858029Z 2025-12-04T10:11:57.8858313Z [W1204 10:03:45.147654936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8858316Z 2025-12-04T10:11:57.8858646Z [W1204 10:03:45.150583406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8858650Z 2025-12-04T10:11:57.8858939Z [W1204 10:03:45.151133305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8858975Z 2025-12-04T10:11:57.8859269Z [W1204 10:03:45.151271268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8859272Z 2025-12-04T10:11:57.8859562Z [W1204 10:03:45.155762335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8859565Z 2025-12-04T10:11:57.8859855Z [W1204 10:03:45.156219002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8859858Z 2025-12-04T10:11:57.8860144Z [W1204 10:03:45.156362925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8860147Z 2025-12-04T10:11:57.8860225Z ('RERUN', {'yellow': True}) [0.4077s] [100%] 2025-12-04T10:11:57.8860947Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:45.551676282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8860954Z 2025-12-04T10:11:57.8861285Z [W1204 10:03:45.552225262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8861290Z 2025-12-04T10:11:57.8861584Z [W1204 10:03:45.552375454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8861588Z 2025-12-04T10:11:57.8861879Z [W1204 10:03:45.555250194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8861883Z 2025-12-04T10:11:57.8862173Z [W1204 10:03:45.555794883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8862177Z 2025-12-04T10:11:57.8862464Z [W1204 10:03:45.555943665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8862501Z 2025-12-04T10:11:57.8862795Z [W1204 10:03:45.560461392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8862799Z 2025-12-04T10:11:57.8863086Z [W1204 10:03:45.560917490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8863089Z 2025-12-04T10:11:57.8863381Z [W1204 10:03:45.561055153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8863385Z 2025-12-04T10:11:57.8863446Z FAILED [0.4051s] [100%] 2025-12-04T10:11:57.8863449Z 2025-12-04T10:11:57.8863535Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8863833Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8863909Z Traceback (most recent call last): 2025-12-04T10:11:57.8864228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8864293Z method(*args, **kwargs) 2025-12-04T10:11:57.8864587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8864649Z method(*args, **kwargs) 2025-12-04T10:11:57.8864938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8865036Z with policy(): 2025-12-04T10:11:57.8865335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8865400Z raise RuntimeError(msg) 2025-12-04T10:11:57.8866238Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8866244Z 2025-12-04T10:11:57.8866374Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8866898Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8866902Z 2025-12-04T10:11:57.8867069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8867204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8867299Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8867649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8867838Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8867900Z graph_break [] 2025-12-04T10:11:57.8868026Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8868725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8868797Z if out == self.unknown_value: 2025-12-04T10:11:57.8869095Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8869166Z Traceback (most recent call last): 2025-12-04T10:11:57.8869472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8869574Z method(*args, **kwargs) 2025-12-04T10:11:57.8869868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8869933Z method(*args, **kwargs) 2025-12-04T10:11:57.8870225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8870283Z with policy(): 2025-12-04T10:11:57.8870584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8870649Z raise RuntimeError(msg) 2025-12-04T10:11:57.8871460Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8871467Z 2025-12-04T10:11:57.8871594Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8872115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8872121Z 2025-12-04T10:11:57.8872277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8872441Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8872541Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8872891Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8873054Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8873115Z graph_break [] 2025-12-04T10:11:57.8873238Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8873937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8874005Z if out == self.unknown_value: 2025-12-04T10:11:57.8874126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8874222Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8874343Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8874689Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8874750Z graph_break [] 2025-12-04T10:11:57.8874833Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8875166Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8875239Z Traceback (most recent call last): 2025-12-04T10:11:57.8875544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8875614Z method(*args, **kwargs) 2025-12-04T10:11:57.8875908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8875973Z method(*args, **kwargs) 2025-12-04T10:11:57.8876261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8876356Z with policy(): 2025-12-04T10:11:57.8876653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8876717Z raise RuntimeError(msg) 2025-12-04T10:11:57.8877527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8877535Z 2025-12-04T10:11:57.8877665Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8878183Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8878189Z 2025-12-04T10:11:57.8878349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8878474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8878567Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8878909Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8879033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8879095Z graph_break [] 2025-12-04T10:11:57.8879217Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8879988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8880096Z if out == self.unknown_value: 2025-12-04T10:11:57.8880218Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8880309Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8880435Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8880774Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8880835Z graph_break [] 2025-12-04T10:11:57.8880955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8881047Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8881167Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8881521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8881591Z graph_break [] 2025-12-04T10:11:57.8882114Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml - 2025-12-04T10:11:57.8882218Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8883497Z FAILED [0.4051s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8883502Z 2025-12-04T10:11:57.8883634Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8884186Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8884190Z 2025-12-04T10:11:57.8884349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8884455Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8884570Z ================== 1 failed, 57 deselected, 2 rerun in 11.62s ================== 2025-12-04T10:11:57.8884633Z Got exit code 1 2025-12-04T10:11:57.8884699Z Retrying single test... 2025-12-04T10:11:57.8884961Z W1204 10:03:52.170000 61104 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8885352Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml 2025-12-04T10:11:57.8885449Z ============================= test session starts ============================== 2025-12-04T10:11:57.8885662Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8885730Z cachedir: .pytest_cache 2025-12-04T10:11:57.8886033Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8886112Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8886180Z configfile: pytest.ini 2025-12-04T10:11:57.8886532Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8886667Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8887235Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8887344Z Running 1 items in this shard 2025-12-04T10:11:57.8887347Z 2025-12-04T10:11:57.8888075Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:53.220567579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8888079Z 2025-12-04T10:11:57.8888377Z [W1204 10:04:02.151917132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8888385Z 2025-12-04T10:11:57.8888674Z [W1204 10:04:02.152181576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8888678Z 2025-12-04T10:11:57.8888966Z [W1204 10:04:02.158054136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8888972Z 2025-12-04T10:11:57.8889297Z [W1204 10:04:02.158642337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8889301Z 2025-12-04T10:11:57.8889589Z [W1204 10:04:02.158810269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8889592Z 2025-12-04T10:11:57.8889879Z [W1204 10:04:02.164341604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8889883Z 2025-12-04T10:11:57.8890170Z [W1204 10:04:02.164911524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8890173Z 2025-12-04T10:11:57.8890464Z [W1204 10:04:02.165087417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8890502Z 2025-12-04T10:11:57.8890584Z ('RERUN', {'yellow': True}) [10.7817s] [100%] 2025-12-04T10:11:57.8891312Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:04:03.327605421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8891318Z 2025-12-04T10:11:57.8891605Z [W1204 10:04:03.328161791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8891608Z 2025-12-04T10:11:57.8891895Z [W1204 10:04:03.328308424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8891901Z 2025-12-04T10:11:57.8892186Z [W1204 10:04:03.331226553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8892192Z 2025-12-04T10:11:57.8892482Z [W1204 10:04:03.331774182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8892486Z 2025-12-04T10:11:57.8892775Z [W1204 10:04:03.331911605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8892779Z 2025-12-04T10:11:57.8893067Z [W1204 10:04:03.336338740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8893071Z 2025-12-04T10:11:57.8893393Z [W1204 10:04:03.336790128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8893397Z 2025-12-04T10:11:57.8893684Z [W1204 10:04:03.336925990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8893738Z 2025-12-04T10:11:57.8893821Z ('RERUN', {'yellow': True}) [0.4087s] [100%] 2025-12-04T10:11:57.8894554Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:04:03.734393409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8894558Z 2025-12-04T10:11:57.8894848Z [W1204 10:04:03.734951398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8894855Z 2025-12-04T10:11:57.8895142Z [W1204 10:04:03.735098841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8895146Z 2025-12-04T10:11:57.8895434Z [W1204 10:04:03.737975630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8895440Z 2025-12-04T10:11:57.8895728Z [W1204 10:04:03.738519659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8895768Z 2025-12-04T10:11:57.8896056Z [W1204 10:04:03.738665691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8896059Z 2025-12-04T10:11:57.8896350Z [W1204 10:04:03.743114147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8896354Z 2025-12-04T10:11:57.8896641Z [W1204 10:04:03.743564805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8896644Z 2025-12-04T10:11:57.8896930Z [W1204 10:04:03.743701268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8896968Z 2025-12-04T10:11:57.8897029Z FAILED [0.4070s] [100%] 2025-12-04T10:11:57.8897032Z 2025-12-04T10:11:57.8897115Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8897413Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8897485Z Traceback (most recent call last): 2025-12-04T10:11:57.8897796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8897860Z method(*args, **kwargs) 2025-12-04T10:11:57.8898157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8898221Z method(*args, **kwargs) 2025-12-04T10:11:57.8898509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8898573Z with policy(): 2025-12-04T10:11:57.8898867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8898935Z raise RuntimeError(msg) 2025-12-04T10:11:57.8899731Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8899735Z 2025-12-04T10:11:57.8899896Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8900417Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8900453Z 2025-12-04T10:11:57.8900613Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8900738Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8900836Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8901180Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8901309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8901367Z graph_break [] 2025-12-04T10:11:57.8901492Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8902189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8902260Z if out == self.unknown_value: 2025-12-04T10:11:57.8902552Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8902657Z Traceback (most recent call last): 2025-12-04T10:11:57.8902956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8903021Z method(*args, **kwargs) 2025-12-04T10:11:57.8903313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8903375Z method(*args, **kwargs) 2025-12-04T10:11:57.8903669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8903728Z with policy(): 2025-12-04T10:11:57.8904027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8904127Z raise RuntimeError(msg) 2025-12-04T10:11:57.8904942Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8904949Z 2025-12-04T10:11:57.8905077Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8905596Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8905600Z 2025-12-04T10:11:57.8905758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8905879Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8905974Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8906324Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8906451Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8906511Z graph_break [] 2025-12-04T10:11:57.8906633Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8907357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8907429Z if out == self.unknown_value: 2025-12-04T10:11:57.8907549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8907673Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8907793Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8908134Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8908196Z graph_break [] 2025-12-04T10:11:57.8908281Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8908568Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.8908641Z Traceback (most recent call last): 2025-12-04T10:11:57.8908936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8909007Z method(*args, **kwargs) 2025-12-04T10:11:57.8909300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8909365Z method(*args, **kwargs) 2025-12-04T10:11:57.8909693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8909752Z with policy(): 2025-12-04T10:11:57.8910051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8910115Z raise RuntimeError(msg) 2025-12-04T10:11:57.8910924Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8910927Z 2025-12-04T10:11:57.8911054Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8911608Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8911613Z 2025-12-04T10:11:57.8911771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8911892Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8911979Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8912332Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8912453Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8912513Z graph_break [] 2025-12-04T10:11:57.8912632Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8913324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8913397Z if out == self.unknown_value: 2025-12-04T10:11:57.8913518Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8913608Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8913732Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8914108Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8914170Z graph_break [] 2025-12-04T10:11:57.8914292Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8914412Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8914536Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8914874Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8914932Z graph_break [] 2025-12-04T10:11:57.8915417Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml - 2025-12-04T10:11:57.8915528Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8916810Z FAILED [0.4070s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8916817Z 2025-12-04T10:11:57.8916979Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8917671Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8917675Z 2025-12-04T10:11:57.8917832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8917942Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8918057Z ================== 1 failed, 57 deselected, 2 rerun in 11.62s ================== 2025-12-04T10:11:57.8918114Z Got exit code 1 2025-12-04T10:11:57.8918592Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.8918904Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.8919173Z W1204 10:04:10.305000 61290 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8919559Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml 2025-12-04T10:11:57.8919654Z ============================= test session starts ============================== 2025-12-04T10:11:57.8919940Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8920008Z cachedir: .pytest_cache 2025-12-04T10:11:57.8920319Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8920400Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8920464Z configfile: pytest.ini 2025-12-04T10:11:57.8920782Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8920909Z collecting ... collected 58 items / 48 deselected / 10 selected 2025-12-04T10:11:57.8920995Z stepcurrent: skipping 48 already run items. 2025-12-04T10:11:57.8921067Z Running 10 items in this shard 2025-12-04T10:11:57.8921071Z 2025-12-04T10:11:57.8921642Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9285s] [ 10%] 2025-12-04T10:11:57.8922134Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5345s] [ 10%] 2025-12-04T10:11:57.8922622Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5538s] [ 10%] 2025-12-04T10:11:57.8922627Z 2025-12-04T10:11:57.8922712Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8923001Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8923073Z Traceback (most recent call last): 2025-12-04T10:11:57.8923387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8923463Z method(*args, **kwargs) 2025-12-04T10:11:57.8923761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8923830Z method(*args, **kwargs) 2025-12-04T10:11:57.8924124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8924184Z with policy(): 2025-12-04T10:11:57.8924532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8924599Z raise RuntimeError(msg) 2025-12-04T10:11:57.8925397Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8925401Z 2025-12-04T10:11:57.8925536Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8926061Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8926100Z 2025-12-04T10:11:57.8926258Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8926385Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8926484Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8927029Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8927160Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8927219Z graph_break [] 2025-12-04T10:11:57.8927510Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8927587Z Traceback (most recent call last): 2025-12-04T10:11:57.8927883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8927949Z method(*args, **kwargs) 2025-12-04T10:11:57.8928240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8928306Z method(*args, **kwargs) 2025-12-04T10:11:57.8928608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8928665Z with policy(): 2025-12-04T10:11:57.8928992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8929060Z raise RuntimeError(msg) 2025-12-04T10:11:57.8929867Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8929907Z 2025-12-04T10:11:57.8930037Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8930553Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8930557Z 2025-12-04T10:11:57.8930718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8930842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8930933Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8931476Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8931635Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8931698Z graph_break [] 2025-12-04T10:11:57.8931820Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8931908Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8932028Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8932567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8932623Z graph_break [] 2025-12-04T10:11:57.8932708Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8933031Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8933107Z Traceback (most recent call last): 2025-12-04T10:11:57.8933407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8933473Z method(*args, **kwargs) 2025-12-04T10:11:57.8933775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8933839Z method(*args, **kwargs) 2025-12-04T10:11:57.8934135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8934193Z with policy(): 2025-12-04T10:11:57.8934489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8934556Z raise RuntimeError(msg) 2025-12-04T10:11:57.8935367Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8935371Z 2025-12-04T10:11:57.8935497Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8936045Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8936050Z 2025-12-04T10:11:57.8936206Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8936332Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8936454Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8936997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8937118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8937175Z graph_break [] 2025-12-04T10:11:57.8937300Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8937385Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8937505Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8938048Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8938119Z graph_break [] 2025-12-04T10:11:57.8938245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8938331Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8938483Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8939020Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8939078Z graph_break [] 2025-12-04T10:11:57.8939567Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml - 2025-12-04T10:11:57.8939665Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8940945Z FAILED [0.5538s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8940982Z 2025-12-04T10:11:57.8941105Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8941623Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8941629Z 2025-12-04T10:11:57.8941785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8941888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8942005Z ================== 1 failed, 48 deselected, 2 rerun in 3.04s =================== 2025-12-04T10:11:57.8942063Z Got exit code 1 2025-12-04T10:11:57.8942129Z Retrying single test... 2025-12-04T10:11:57.8942390Z W1204 10:04:19.970000 61472 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8942774Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml 2025-12-04T10:11:57.8942871Z ============================= test session starts ============================== 2025-12-04T10:11:57.8943110Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8943177Z cachedir: .pytest_cache 2025-12-04T10:11:57.8943484Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8943594Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8943659Z configfile: pytest.ini 2025-12-04T10:11:57.8943977Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8944105Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8944676Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8944746Z Running 1 items in this shard 2025-12-04T10:11:57.8944750Z 2025-12-04T10:11:57.8945475Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:21.553416183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8945485Z 2025-12-04T10:11:57.8945820Z [W1204 10:04:30.555430353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8945824Z 2025-12-04T10:11:57.8946113Z [W1204 10:04:30.555683357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8946120Z 2025-12-04T10:11:57.8946414Z [W1204 10:04:30.561745621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8946417Z 2025-12-04T10:11:57.8946707Z [W1204 10:04:30.562347121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8946711Z 2025-12-04T10:11:57.8947003Z [W1204 10:04:30.562527714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8947057Z 2025-12-04T10:11:57.8947345Z [W1204 10:04:30.567996727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8947350Z 2025-12-04T10:11:57.8947641Z [W1204 10:04:30.568540006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8947644Z 2025-12-04T10:11:57.8947931Z [W1204 10:04:30.568700649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8947935Z 2025-12-04T10:11:57.8948017Z ('RERUN', {'yellow': True}) [10.9380s] [100%] 2025-12-04T10:11:57.8948741Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:31.363507874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8948748Z 2025-12-04T10:11:57.8949035Z [W1204 10:04:31.364054814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8949041Z 2025-12-04T10:11:57.8949328Z [W1204 10:04:31.364193516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8949331Z 2025-12-04T10:11:57.8949615Z [W1204 10:04:31.367100486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8949619Z 2025-12-04T10:11:57.8949941Z [W1204 10:04:31.367542894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8949945Z 2025-12-04T10:11:57.8950234Z [W1204 10:04:31.367680286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8950272Z 2025-12-04T10:11:57.8950564Z [W1204 10:04:31.372203253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8950568Z 2025-12-04T10:11:57.8950866Z [W1204 10:04:31.372668701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8950870Z 2025-12-04T10:11:57.8951164Z [W1204 10:04:31.372803163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8951167Z 2025-12-04T10:11:57.8951248Z ('RERUN', {'yellow': True}) [0.4916s] [100%] 2025-12-04T10:11:57.8951970Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:31.853352985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8951979Z 2025-12-04T10:11:57.8952264Z [W1204 10:04:31.853890874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8952268Z 2025-12-04T10:11:57.8952588Z [W1204 10:04:31.854031006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8952596Z 2025-12-04T10:11:57.8952883Z [W1204 10:04:31.856915786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8952886Z 2025-12-04T10:11:57.8953173Z [W1204 10:04:31.857361433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8953176Z 2025-12-04T10:11:57.8953466Z [W1204 10:04:31.857499405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8953505Z 2025-12-04T10:11:57.8953795Z [W1204 10:04:31.862038953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8953798Z 2025-12-04T10:11:57.8954093Z [W1204 10:04:31.862495101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8954097Z 2025-12-04T10:11:57.8954383Z [W1204 10:04:31.862631553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8954386Z 2025-12-04T10:11:57.8954448Z FAILED [0.4925s] [100%] 2025-12-04T10:11:57.8954451Z 2025-12-04T10:11:57.8954536Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8954828Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8954906Z Traceback (most recent call last): 2025-12-04T10:11:57.8955218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8955285Z method(*args, **kwargs) 2025-12-04T10:11:57.8955589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8955653Z method(*args, **kwargs) 2025-12-04T10:11:57.8955947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8956005Z with policy(): 2025-12-04T10:11:57.8956335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8956405Z raise RuntimeError(msg) 2025-12-04T10:11:57.8957199Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8957237Z 2025-12-04T10:11:57.8957367Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8957885Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8957889Z 2025-12-04T10:11:57.8958049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8958185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8958281Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8958829Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8958959Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8959020Z graph_break [] 2025-12-04T10:11:57.8959178Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8959918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8959992Z if out == self.unknown_value: 2025-12-04T10:11:57.8960289Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8960363Z Traceback (most recent call last): 2025-12-04T10:11:57.8960663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8960841Z method(*args, **kwargs) 2025-12-04T10:11:57.8961137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8961200Z method(*args, **kwargs) 2025-12-04T10:11:57.8961490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8961549Z with policy(): 2025-12-04T10:11:57.8961843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8961907Z raise RuntimeError(msg) 2025-12-04T10:11:57.8962721Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8962728Z 2025-12-04T10:11:57.8962852Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8963373Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8963376Z 2025-12-04T10:11:57.8963536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8963670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8963799Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8964343Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8964508Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8964569Z graph_break [] 2025-12-04T10:11:57.8964697Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8965392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8965464Z if out == self.unknown_value: 2025-12-04T10:11:57.8965590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8965687Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8965815Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8966353Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8966414Z graph_break [] 2025-12-04T10:11:57.8966546Z =================================== FAILURES =================================== 2025-12-04T10:11:57.8966843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8966917Z Traceback (most recent call last): 2025-12-04T10:11:57.8967221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8967287Z method(*args, **kwargs) 2025-12-04T10:11:57.8967583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8967645Z method(*args, **kwargs) 2025-12-04T10:11:57.8967936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8968039Z with policy(): 2025-12-04T10:11:57.8968334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8968402Z raise RuntimeError(msg) 2025-12-04T10:11:57.8969214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8969218Z 2025-12-04T10:11:57.8969343Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8969866Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8969872Z 2025-12-04T10:11:57.8970031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8970158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8970247Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8970790Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8970919Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8971012Z graph_break [] 2025-12-04T10:11:57.8971139Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8971831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8971933Z if out == self.unknown_value: 2025-12-04T10:11:57.8972058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8972150Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8972277Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8972815Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8972873Z graph_break [] 2025-12-04T10:11:57.8972998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8973085Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8973206Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8973795Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8973855Z graph_break [] 2025-12-04T10:11:57.8974355Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml - 2025-12-04T10:11:57.8974458Z =========================== short test summary info ============================ 2025-12-04T10:11:57.8975747Z FAILED [0.4925s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.8975787Z 2025-12-04T10:11:57.8975913Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8976431Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8976436Z 2025-12-04T10:11:57.8976592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8976699Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.8976817Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.8976876Z Got exit code 1 2025-12-04T10:11:57.8976941Z Retrying single test... 2025-12-04T10:11:57.8977206Z W1204 10:04:38.407000 61658 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.8977591Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml 2025-12-04T10:11:57.8977685Z ============================= test session starts ============================== 2025-12-04T10:11:57.8977890Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.8977953Z cachedir: .pytest_cache 2025-12-04T10:11:57.8978299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.8978378Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.8978444Z configfile: pytest.ini 2025-12-04T10:11:57.8978759Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.8978922Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.8979495Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8979565Z Running 1 items in this shard 2025-12-04T10:11:57.8979568Z 2025-12-04T10:11:57.8980299Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:40.996144010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8980303Z 2025-12-04T10:11:57.8980600Z [W1204 10:04:49.261777275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8980606Z 2025-12-04T10:11:57.8980898Z [W1204 10:04:49.262038479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8980906Z 2025-12-04T10:11:57.8981228Z [W1204 10:04:49.268750784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8981231Z 2025-12-04T10:11:57.8981519Z [W1204 10:04:49.269357304 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8981522Z 2025-12-04T10:11:57.8981813Z [W1204 10:04:49.269539797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8981816Z 2025-12-04T10:11:57.8982117Z [W1204 10:04:49.275117143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8982122Z 2025-12-04T10:11:57.8982448Z [W1204 10:04:49.275655082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8982451Z 2025-12-04T10:11:57.8982739Z [W1204 10:04:49.275816115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8982742Z 2025-12-04T10:11:57.8982826Z ('RERUN', {'yellow': True}) [11.2059s] [100%] 2025-12-04T10:11:57.8983549Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:50.073150382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8983553Z 2025-12-04T10:11:57.8983840Z [W1204 10:04:50.073712082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8983845Z 2025-12-04T10:11:57.8984132Z [W1204 10:04:50.073862225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8984136Z 2025-12-04T10:11:57.8984423Z [W1204 10:04:50.076833365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8984429Z 2025-12-04T10:11:57.8984715Z [W1204 10:04:50.077283123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8984718Z 2025-12-04T10:11:57.8985036Z [W1204 10:04:50.077421975 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8985040Z 2025-12-04T10:11:57.8985328Z [W1204 10:04:50.081996884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8985364Z 2025-12-04T10:11:57.8985649Z [W1204 10:04:50.082458701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8985654Z 2025-12-04T10:11:57.8985944Z [W1204 10:04:50.082594814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8985948Z 2025-12-04T10:11:57.8986025Z ('RERUN', {'yellow': True}) [0.5010s] [100%] 2025-12-04T10:11:57.8986745Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:50.573192013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8986748Z 2025-12-04T10:11:57.8987032Z [W1204 10:04:50.573729472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8987037Z 2025-12-04T10:11:57.8987326Z [W1204 10:04:50.573871635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8987331Z 2025-12-04T10:11:57.8987650Z [W1204 10:04:50.576744784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8987654Z 2025-12-04T10:11:57.8987940Z [W1204 10:04:50.577189791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8987943Z 2025-12-04T10:11:57.8988241Z [W1204 10:04:50.577329814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8988244Z 2025-12-04T10:11:57.8988534Z [W1204 10:04:50.581867111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8988537Z 2025-12-04T10:11:57.8988828Z [W1204 10:04:50.582327729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8988864Z 2025-12-04T10:11:57.8989154Z [W1204 10:04:50.582462952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.8989158Z 2025-12-04T10:11:57.8989220Z FAILED [0.4981s] [100%] 2025-12-04T10:11:57.8989223Z 2025-12-04T10:11:57.8989303Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.8989595Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8989672Z Traceback (most recent call last): 2025-12-04T10:11:57.8989977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8990049Z method(*args, **kwargs) 2025-12-04T10:11:57.8990351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8990413Z method(*args, **kwargs) 2025-12-04T10:11:57.8990708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8990766Z with policy(): 2025-12-04T10:11:57.8991192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8991321Z raise RuntimeError(msg) 2025-12-04T10:11:57.8992190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.8992195Z 2025-12-04T10:11:57.8992331Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8992907Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8992911Z 2025-12-04T10:11:57.8993077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8993204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8993298Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8993849Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8993976Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8994040Z graph_break [] 2025-12-04T10:11:57.8994164Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.8994891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.8994969Z if out == self.unknown_value: 2025-12-04T10:11:57.8995262Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.8995336Z Traceback (most recent call last): 2025-12-04T10:11:57.8995635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8995697Z method(*args, **kwargs) 2025-12-04T10:11:57.8995990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.8996091Z method(*args, **kwargs) 2025-12-04T10:11:57.8996401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.8996464Z with policy(): 2025-12-04T10:11:57.8996769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.8996837Z raise RuntimeError(msg) 2025-12-04T10:11:57.8997657Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.8997662Z 2025-12-04T10:11:57.8997789Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.8998312Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.8998319Z 2025-12-04T10:11:57.8998481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.8998618Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.8998716Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.8999528Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.8999662Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.8999721Z graph_break [] 2025-12-04T10:11:57.8999849Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9000684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9000754Z if out == self.unknown_value: 2025-12-04T10:11:57.9000880Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9000973Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9001102Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9001648Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9001708Z graph_break [] 2025-12-04T10:11:57.9001793Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9002087Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9002164Z Traceback (most recent call last): 2025-12-04T10:11:57.9002504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9002569Z method(*args, **kwargs) 2025-12-04T10:11:57.9002863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9002924Z method(*args, **kwargs) 2025-12-04T10:11:57.9003213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9003274Z with policy(): 2025-12-04T10:11:57.9003567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9003672Z raise RuntimeError(msg) 2025-12-04T10:11:57.9004485Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9004489Z 2025-12-04T10:11:57.9004614Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9005136Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9005140Z 2025-12-04T10:11:57.9005299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9005428Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9005521Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9006070Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9006195Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9006252Z graph_break [] 2025-12-04T10:11:57.9006388Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9007114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9007188Z if out == self.unknown_value: 2025-12-04T10:11:57.9007313Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9007446Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9007569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9008115Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9008172Z graph_break [] 2025-12-04T10:11:57.9008297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9008384Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9008508Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9009042Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9009101Z graph_break [] 2025-12-04T10:11:57.9009626Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml - 2025-12-04T10:11:57.9009727Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9011011Z FAILED [0.4981s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9011016Z 2025-12-04T10:11:57.9011177Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9011697Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9011701Z 2025-12-04T10:11:57.9011854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9011956Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9012075Z ================== 1 failed, 57 deselected, 2 rerun in 12.23s ================== 2025-12-04T10:11:57.9012134Z Got exit code 1 2025-12-04T10:11:57.9012609Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9012850Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9013116Z W1204 10:04:57.168000 61845 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9013509Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml 2025-12-04T10:11:57.9013602Z ============================= test session starts ============================== 2025-12-04T10:11:57.9013810Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9013876Z cachedir: .pytest_cache 2025-12-04T10:11:57.9014219Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9014298Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9014361Z configfile: pytest.ini 2025-12-04T10:11:57.9014675Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9014852Z collecting ... collected 58 items / 49 deselected / 9 selected 2025-12-04T10:11:57.9014938Z stepcurrent: skipping 49 already run items. 2025-12-04T10:11:57.9015011Z Running 9 items in this shard 2025-12-04T10:11:57.9015015Z 2025-12-04T10:11:57.9015517Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0089s] [ 11%] 2025-12-04T10:11:57.9016012Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6159s] [ 11%] 2025-12-04T10:11:57.9016463Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6300s] [ 11%] 2025-12-04T10:11:57.9016469Z 2025-12-04T10:11:57.9016553Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9016891Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9016965Z Traceback (most recent call last): 2025-12-04T10:11:57.9017463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9017536Z method(*args, **kwargs) 2025-12-04T10:11:57.9017835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9017901Z method(*args, **kwargs) 2025-12-04T10:11:57.9018196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9018255Z with policy(): 2025-12-04T10:11:57.9018559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9018695Z raise RuntimeError(msg) 2025-12-04T10:11:57.9019515Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9019519Z 2025-12-04T10:11:57.9019650Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9020176Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9020182Z 2025-12-04T10:11:57.9020338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9020468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9020565Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9020917Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9021045Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9021108Z graph_break [] 2025-12-04T10:11:57.9021406Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9021532Z Traceback (most recent call last): 2025-12-04T10:11:57.9021836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9021898Z method(*args, **kwargs) 2025-12-04T10:11:57.9022250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9022315Z method(*args, **kwargs) 2025-12-04T10:11:57.9022609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9022669Z with policy(): 2025-12-04T10:11:57.9022966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9023038Z raise RuntimeError(msg) 2025-12-04T10:11:57.9023875Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9023879Z 2025-12-04T10:11:57.9024009Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9024586Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9024590Z 2025-12-04T10:11:57.9024750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9024876Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9024967Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9025318Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9025441Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9025499Z graph_break [] 2025-12-04T10:11:57.9025623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9025746Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9025866Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9026208Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9026267Z graph_break [] 2025-12-04T10:11:57.9026352Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9026647Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9026720Z Traceback (most recent call last): 2025-12-04T10:11:57.9027019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9027082Z method(*args, **kwargs) 2025-12-04T10:11:57.9027374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9027443Z method(*args, **kwargs) 2025-12-04T10:11:57.9027731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9027790Z with policy(): 2025-12-04T10:11:57.9028093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9028159Z raise RuntimeError(msg) 2025-12-04T10:11:57.9029044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9029081Z 2025-12-04T10:11:57.9029207Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9029734Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9029738Z 2025-12-04T10:11:57.9029891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9030015Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9030107Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9030449Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9030574Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9030630Z graph_break [] 2025-12-04T10:11:57.9030751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9030840Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9030959Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9031340Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9031399Z graph_break [] 2025-12-04T10:11:57.9031521Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9031609Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9031729Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9032068Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9032131Z graph_break [] 2025-12-04T10:11:57.9032662Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml - 2025-12-04T10:11:57.9032767Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9034089Z FAILED [0.6300s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9034094Z 2025-12-04T10:11:57.9034226Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9034758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9034763Z 2025-12-04T10:11:57.9034919Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9035026Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9035139Z ================== 1 failed, 49 deselected, 2 rerun in 3.28s =================== 2025-12-04T10:11:57.9035200Z Got exit code 1 2025-12-04T10:11:57.9035264Z Retrying single test... 2025-12-04T10:11:57.9035558Z W1204 10:05:07.003000 62034 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9035946Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml 2025-12-04T10:11:57.9036074Z ============================= test session starts ============================== 2025-12-04T10:11:57.9036287Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9036351Z cachedir: .pytest_cache 2025-12-04T10:11:57.9036655Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9036732Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9036797Z configfile: pytest.ini 2025-12-04T10:11:57.9037112Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9037241Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9037815Z stepcurrent: skipping 49 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9037893Z Running 1 items in this shard 2025-12-04T10:11:57.9037896Z 2025-12-04T10:11:57.9038682Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:08.246299095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9038689Z 2025-12-04T10:11:57.9038996Z [W1204 10:05:17.356698211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9039001Z 2025-12-04T10:11:57.9039295Z [W1204 10:05:17.356955755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9039298Z 2025-12-04T10:11:57.9039588Z [W1204 10:05:17.362761724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9039636Z 2025-12-04T10:11:57.9039965Z [W1204 10:05:17.363352754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9039969Z 2025-12-04T10:11:57.9040259Z [W1204 10:05:17.363510787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9040263Z 2025-12-04T10:11:57.9040569Z [W1204 10:05:17.369115383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9040572Z 2025-12-04T10:11:57.9040865Z [W1204 10:05:17.369648512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9040868Z 2025-12-04T10:11:57.9041160Z [W1204 10:05:17.369841685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9041167Z 2025-12-04T10:11:57.9041250Z ('RERUN', {'yellow': True}) [11.1581s] [100%] 2025-12-04T10:11:57.9041989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:18.723504635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9041992Z 2025-12-04T10:11:57.9042281Z [W1204 10:05:18.724053044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9042284Z 2025-12-04T10:11:57.9042616Z [W1204 10:05:18.724203717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9042620Z 2025-12-04T10:11:57.9042909Z [W1204 10:05:18.727224248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9042946Z 2025-12-04T10:11:57.9043236Z [W1204 10:05:18.727804698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9043243Z 2025-12-04T10:11:57.9043531Z [W1204 10:05:18.727950201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9043535Z 2025-12-04T10:11:57.9043820Z [W1204 10:05:18.732618070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9043823Z 2025-12-04T10:11:57.9044115Z [W1204 10:05:18.733086668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9044118Z 2025-12-04T10:11:57.9044403Z [W1204 10:05:18.733225721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9044408Z 2025-12-04T10:11:57.9044498Z ('RERUN', {'yellow': True}) [0.5964s] [100%] 2025-12-04T10:11:57.9045262Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:19.315144211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9045267Z 2025-12-04T10:11:57.9045559Z [W1204 10:05:19.315688540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9045562Z 2025-12-04T10:11:57.9045850Z [W1204 10:05:19.315836872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9045854Z 2025-12-04T10:11:57.9046140Z [W1204 10:05:19.318836144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9046148Z 2025-12-04T10:11:57.9046470Z [W1204 10:05:19.319398803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9046473Z 2025-12-04T10:11:57.9046763Z [W1204 10:05:19.319539856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9046767Z 2025-12-04T10:11:57.9047054Z [W1204 10:05:19.324189915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9047057Z 2025-12-04T10:11:57.9047345Z [W1204 10:05:19.324667033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9047348Z 2025-12-04T10:11:57.9047639Z [W1204 10:05:19.324805616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9047643Z 2025-12-04T10:11:57.9047703Z FAILED [0.5925s] [100%] 2025-12-04T10:11:57.9047708Z 2025-12-04T10:11:57.9047795Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9048093Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9048175Z Traceback (most recent call last): 2025-12-04T10:11:57.9048489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9048555Z method(*args, **kwargs) 2025-12-04T10:11:57.9048888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9048952Z method(*args, **kwargs) 2025-12-04T10:11:57.9049240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9049336Z with policy(): 2025-12-04T10:11:57.9049633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9049697Z raise RuntimeError(msg) 2025-12-04T10:11:57.9050510Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9050515Z 2025-12-04T10:11:57.9050641Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9051172Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9051175Z 2025-12-04T10:11:57.9051332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9051466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9051560Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9051942Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9052075Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9052135Z graph_break [] 2025-12-04T10:11:57.9052257Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9052954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9053025Z if out == self.unknown_value: 2025-12-04T10:11:57.9053386Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9053458Z Traceback (most recent call last): 2025-12-04T10:11:57.9053756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9053821Z method(*args, **kwargs) 2025-12-04T10:11:57.9054113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9054177Z method(*args, **kwargs) 2025-12-04T10:11:57.9054463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9054522Z with policy(): 2025-12-04T10:11:57.9054819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9054894Z raise RuntimeError(msg) 2025-12-04T10:11:57.9055731Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9055738Z 2025-12-04T10:11:57.9055865Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9056425Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9056429Z 2025-12-04T10:11:57.9056591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9056717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9056850Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9057196Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9057324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9057385Z graph_break [] 2025-12-04T10:11:57.9057506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9058198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9058266Z if out == self.unknown_value: 2025-12-04T10:11:57.9058387Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9058481Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9058605Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9058981Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9059044Z graph_break [] 2025-12-04T10:11:57.9059126Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9059429Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9059500Z Traceback (most recent call last): 2025-12-04T10:11:57.9059797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9059865Z method(*args, **kwargs) 2025-12-04T10:11:57.9060156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9060257Z method(*args, **kwargs) 2025-12-04T10:11:57.9060549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9060610Z with policy(): 2025-12-04T10:11:57.9060909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9060973Z raise RuntimeError(msg) 2025-12-04T10:11:57.9061798Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9061805Z 2025-12-04T10:11:57.9061930Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9062454Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9062461Z 2025-12-04T10:11:57.9062623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9062744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9062847Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9063226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9063350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9063413Z graph_break [] 2025-12-04T10:11:57.9063537Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9064260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9064335Z if out == self.unknown_value: 2025-12-04T10:11:57.9064459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9064554Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9064673Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9065010Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9065073Z graph_break [] 2025-12-04T10:11:57.9065193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9065280Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9065404Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9065776Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9065838Z graph_break [] 2025-12-04T10:11:57.9066324Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml - 2025-12-04T10:11:57.9066424Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9067740Z FAILED [0.5925s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9067779Z 2025-12-04T10:11:57.9067908Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9068432Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9068436Z 2025-12-04T10:11:57.9068591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9068699Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9068814Z ================== 1 failed, 57 deselected, 2 rerun in 12.37s ================== 2025-12-04T10:11:57.9068873Z Got exit code 1 2025-12-04T10:11:57.9068941Z Retrying single test... 2025-12-04T10:11:57.9069205Z W1204 10:05:25.913000 62228 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9069597Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml 2025-12-04T10:11:57.9069692Z ============================= test session starts ============================== 2025-12-04T10:11:57.9069898Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9069969Z cachedir: .pytest_cache 2025-12-04T10:11:57.9070313Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9070393Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9070457Z configfile: pytest.ini 2025-12-04T10:11:57.9070771Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9070938Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9071513Z stepcurrent: skipping 49 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9071582Z Running 1 items in this shard 2025-12-04T10:11:57.9071589Z 2025-12-04T10:11:57.9072330Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:27.138526815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9072334Z 2025-12-04T10:11:57.9072634Z [W1204 10:05:36.133227938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9072645Z 2025-12-04T10:11:57.9072938Z [W1204 10:05:36.133511293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9072941Z 2025-12-04T10:11:57.9073334Z [W1204 10:05:36.139256772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9073337Z 2025-12-04T10:11:57.9073630Z [W1204 10:05:36.139842393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9073633Z 2025-12-04T10:11:57.9073923Z [W1204 10:05:36.140054217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9073926Z 2025-12-04T10:11:57.9074217Z [W1204 10:05:36.145568332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9074222Z 2025-12-04T10:11:57.9074508Z [W1204 10:05:36.146093350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9074545Z 2025-12-04T10:11:57.9074838Z [W1204 10:05:36.146283974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9074842Z 2025-12-04T10:11:57.9074920Z ('RERUN', {'yellow': True}) [11.0213s] [100%] 2025-12-04T10:11:57.9075651Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:37.478784383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9075658Z 2025-12-04T10:11:57.9075946Z [W1204 10:05:37.479315333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9075951Z 2025-12-04T10:11:57.9076240Z [W1204 10:05:37.479458905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9076243Z 2025-12-04T10:11:57.9076532Z [W1204 10:05:37.482415196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9076535Z 2025-12-04T10:11:57.9076822Z [W1204 10:05:37.482978926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9076825Z 2025-12-04T10:11:57.9077147Z [W1204 10:05:37.483116928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9077151Z 2025-12-04T10:11:57.9077441Z [W1204 10:05:37.487528004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9077444Z 2025-12-04T10:11:57.9077773Z [W1204 10:05:37.487986992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9077779Z 2025-12-04T10:11:57.9078068Z [W1204 10:05:37.488132665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9078071Z 2025-12-04T10:11:57.9078153Z ('RERUN', {'yellow': True}) [0.5755s] [100%] 2025-12-04T10:11:57.9078894Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:38.051796783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9078898Z 2025-12-04T10:11:57.9079187Z [W1204 10:05:38.052337013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9079193Z 2025-12-04T10:11:57.9079480Z [W1204 10:05:38.052485406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9079485Z 2025-12-04T10:11:57.9079822Z [W1204 10:05:38.055420706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9079825Z 2025-12-04T10:11:57.9080159Z [W1204 10:05:38.055980776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9080163Z 2025-12-04T10:11:57.9080452Z [W1204 10:05:38.056119729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9080455Z 2025-12-04T10:11:57.9080745Z [W1204 10:05:38.060674738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9080748Z 2025-12-04T10:11:57.9081039Z [W1204 10:05:38.061138326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9081080Z 2025-12-04T10:11:57.9081377Z [W1204 10:05:38.061274698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9081380Z 2025-12-04T10:11:57.9081440Z FAILED [0.5728s] [100%] 2025-12-04T10:11:57.9081443Z 2025-12-04T10:11:57.9081527Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9081830Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9081909Z Traceback (most recent call last): 2025-12-04T10:11:57.9082222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9082285Z method(*args, **kwargs) 2025-12-04T10:11:57.9082593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9082664Z method(*args, **kwargs) 2025-12-04T10:11:57.9082957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9083020Z with policy(): 2025-12-04T10:11:57.9083315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9083380Z raise RuntimeError(msg) 2025-12-04T10:11:57.9084231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9084236Z 2025-12-04T10:11:57.9084365Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9084929Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9084933Z 2025-12-04T10:11:57.9085090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9085216Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9085311Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9085661Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9085791Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9085847Z graph_break [] 2025-12-04T10:11:57.9085969Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9086702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9086784Z if out == self.unknown_value: 2025-12-04T10:11:57.9087087Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9087161Z Traceback (most recent call last): 2025-12-04T10:11:57.9087460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9087528Z method(*args, **kwargs) 2025-12-04T10:11:57.9087820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9087883Z method(*args, **kwargs) 2025-12-04T10:11:57.9088177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9088279Z with policy(): 2025-12-04T10:11:57.9088579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9088643Z raise RuntimeError(msg) 2025-12-04T10:11:57.9089471Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9089475Z 2025-12-04T10:11:57.9089602Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9090132Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9090139Z 2025-12-04T10:11:57.9090301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9090429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9090523Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9090869Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9090994Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9091090Z graph_break [] 2025-12-04T10:11:57.9091213Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9091903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9092012Z if out == self.unknown_value: 2025-12-04T10:11:57.9092135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9092231Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9092353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9092694Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9092757Z graph_break [] 2025-12-04T10:11:57.9092841Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9093136Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9093211Z Traceback (most recent call last): 2025-12-04T10:11:57.9093507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9093573Z method(*args, **kwargs) 2025-12-04T10:11:57.9093897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9093960Z method(*args, **kwargs) 2025-12-04T10:11:57.9094251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9094309Z with policy(): 2025-12-04T10:11:57.9094608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9094671Z raise RuntimeError(msg) 2025-12-04T10:11:57.9095492Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9095531Z 2025-12-04T10:11:57.9095661Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9096183Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9096186Z 2025-12-04T10:11:57.9096349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9096473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9096562Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9096906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9097030Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9097092Z graph_break [] 2025-12-04T10:11:57.9097215Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9097901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9097974Z if out == self.unknown_value: 2025-12-04T10:11:57.9098136Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9098237Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9098368Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9098746Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9098811Z graph_break [] 2025-12-04T10:11:57.9098932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9099017Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9099139Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9099475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9099536Z graph_break [] 2025-12-04T10:11:57.9100023Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml - 2025-12-04T10:11:57.9100122Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9101473Z FAILED [0.5728s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9101478Z 2025-12-04T10:11:57.9101601Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9102131Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9102135Z 2025-12-04T10:11:57.9102292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9102433Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9102546Z ================== 1 failed, 57 deselected, 2 rerun in 12.19s ================== 2025-12-04T10:11:57.9102603Z Got exit code 1 2025-12-04T10:11:57.9103081Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9103324Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9103592Z W1204 10:05:44.629000 62422 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9103978Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml 2025-12-04T10:11:57.9104075Z ============================= test session starts ============================== 2025-12-04T10:11:57.9104284Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9104349Z cachedir: .pytest_cache 2025-12-04T10:11:57.9104662Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9104737Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9104802Z configfile: pytest.ini 2025-12-04T10:11:57.9105122Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9105291Z collecting ... collected 58 items / 50 deselected / 8 selected 2025-12-04T10:11:57.9105390Z stepcurrent: skipping 50 already run items. 2025-12-04T10:11:57.9105466Z Running 8 items in this shard 2025-12-04T10:11:57.9105469Z 2025-12-04T10:11:57.9105969Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8400s] [ 12%] 2025-12-04T10:11:57.9106513Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4448s] [ 12%] 2025-12-04T10:11:57.9106956Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4354s] [ 12%] 2025-12-04T10:11:57.9106959Z 2025-12-04T10:11:57.9107042Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9107336Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9107410Z Traceback (most recent call last): 2025-12-04T10:11:57.9107721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9107787Z method(*args, **kwargs) 2025-12-04T10:11:57.9108115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9108184Z method(*args, **kwargs) 2025-12-04T10:11:57.9108478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9108545Z with policy(): 2025-12-04T10:11:57.9108843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9108910Z raise RuntimeError(msg) 2025-12-04T10:11:57.9109718Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9109757Z 2025-12-04T10:11:57.9109883Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9110408Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9110412Z 2025-12-04T10:11:57.9110569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9110694Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9110791Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9111135Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9111275Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9111335Z graph_break [] 2025-12-04T10:11:57.9111627Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9111702Z Traceback (most recent call last): 2025-12-04T10:11:57.9111999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9112067Z method(*args, **kwargs) 2025-12-04T10:11:57.9112361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9112458Z method(*args, **kwargs) 2025-12-04T10:11:57.9112752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9112810Z with policy(): 2025-12-04T10:11:57.9113104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9113210Z raise RuntimeError(msg) 2025-12-04T10:11:57.9114021Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9114025Z 2025-12-04T10:11:57.9114153Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9114673Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9114677Z 2025-12-04T10:11:57.9114834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9114961Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9115050Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9115436Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9115560Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9115617Z graph_break [] 2025-12-04T10:11:57.9115744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9115832Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9115955Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9116294Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9116388Z graph_break [] 2025-12-04T10:11:57.9116475Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9116767Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9116841Z Traceback (most recent call last): 2025-12-04T10:11:57.9117282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9117360Z method(*args, **kwargs) 2025-12-04T10:11:57.9117666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9117735Z method(*args, **kwargs) 2025-12-04T10:11:57.9118029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9118099Z with policy(): 2025-12-04T10:11:57.9118393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9118466Z raise RuntimeError(msg) 2025-12-04T10:11:57.9119282Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9119287Z 2025-12-04T10:11:57.9119410Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9120033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9120037Z 2025-12-04T10:11:57.9120195Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9120374Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9120464Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9120806Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9120932Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9120989Z graph_break [] 2025-12-04T10:11:57.9121114Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9121203Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9121324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9121671Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9121731Z graph_break [] 2025-12-04T10:11:57.9121852Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9121943Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9122114Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9122458Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9122514Z graph_break [] 2025-12-04T10:11:57.9123005Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml - 2025-12-04T10:11:57.9123110Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9124401Z FAILED [0.4354s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9124462Z 2025-12-04T10:11:57.9124595Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9125115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9125118Z 2025-12-04T10:11:57.9125277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9125386Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9125503Z ================== 1 failed, 50 deselected, 2 rerun in 2.74s =================== 2025-12-04T10:11:57.9125562Z Got exit code 1 2025-12-04T10:11:57.9125629Z Retrying single test... 2025-12-04T10:11:57.9125894Z W1204 10:05:54.360000 62610 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9126281Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml 2025-12-04T10:11:57.9126376Z ============================= test session starts ============================== 2025-12-04T10:11:57.9126621Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9126687Z cachedir: .pytest_cache 2025-12-04T10:11:57.9126991Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9127104Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9127169Z configfile: pytest.ini 2025-12-04T10:11:57.9127485Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9127612Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9128179Z stepcurrent: skipping 50 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9128254Z Running 1 items in this shard 2025-12-04T10:11:57.9128258Z 2025-12-04T10:11:57.9128989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:05:55.414682188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9128996Z 2025-12-04T10:11:57.9129300Z [W1204 10:06:04.554893326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9129303Z 2025-12-04T10:11:57.9129629Z [W1204 10:06:04.555147050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9129634Z 2025-12-04T10:11:57.9129935Z [W1204 10:06:04.561695764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9129939Z 2025-12-04T10:11:57.9130229Z [W1204 10:06:04.562299774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9130233Z 2025-12-04T10:11:57.9130525Z [W1204 10:06:04.562471947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9130530Z 2025-12-04T10:11:57.9130852Z [W1204 10:06:04.567911641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9130856Z 2025-12-04T10:11:57.9131144Z [W1204 10:06:04.568444900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9131151Z 2025-12-04T10:11:57.9131451Z [W1204 10:06:04.568637173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9131455Z 2025-12-04T10:11:57.9131538Z ('RERUN', {'yellow': True}) [10.9993s] [100%] 2025-12-04T10:11:57.9132271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:05.737461229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9132276Z 2025-12-04T10:11:57.9132568Z [W1204 10:06:05.737988828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9132572Z 2025-12-04T10:11:57.9132865Z [W1204 10:06:05.738137200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9132869Z 2025-12-04T10:11:57.9133155Z [W1204 10:06:05.741079661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9133158Z 2025-12-04T10:11:57.9133501Z [W1204 10:06:05.741644051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9133505Z 2025-12-04T10:11:57.9133794Z [W1204 10:06:05.741785664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9133797Z 2025-12-04T10:11:57.9134127Z [W1204 10:06:05.746234751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9134135Z 2025-12-04T10:11:57.9134430Z [W1204 10:06:05.746692559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9134433Z 2025-12-04T10:11:57.9134721Z [W1204 10:06:05.746828031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9134728Z 2025-12-04T10:11:57.9138984Z ('RERUN', {'yellow': True}) [0.4077s] [100%] 2025-12-04T10:11:57.9139794Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:06.145598445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9139802Z 2025-12-04T10:11:57.9140118Z [W1204 10:06:06.146126184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9140128Z 2025-12-04T10:11:57.9140507Z [W1204 10:06:06.146270016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9140512Z 2025-12-04T10:11:57.9140804Z [W1204 10:06:06.149135036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9140807Z 2025-12-04T10:11:57.9141111Z [W1204 10:06:06.149694935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9141115Z 2025-12-04T10:11:57.9141403Z [W1204 10:06:06.149834148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9141407Z 2025-12-04T10:11:57.9141695Z [W1204 10:06:06.154235064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9141737Z 2025-12-04T10:11:57.9142024Z [W1204 10:06:06.154689152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9142027Z 2025-12-04T10:11:57.9142311Z [W1204 10:06:06.154826154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9142314Z 2025-12-04T10:11:57.9142378Z FAILED [0.4048s] [100%] 2025-12-04T10:11:57.9142382Z 2025-12-04T10:11:57.9142474Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9142781Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9142858Z Traceback (most recent call last): 2025-12-04T10:11:57.9143180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9143250Z method(*args, **kwargs) 2025-12-04T10:11:57.9143555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9143622Z method(*args, **kwargs) 2025-12-04T10:11:57.9143916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9143977Z with policy(): 2025-12-04T10:11:57.9144279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9144382Z raise RuntimeError(msg) 2025-12-04T10:11:57.9145196Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9145235Z 2025-12-04T10:11:57.9145383Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9145918Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9145922Z 2025-12-04T10:11:57.9146082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9146222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9146326Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9146682Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9146816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9146877Z graph_break [] 2025-12-04T10:11:57.9147009Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9147750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9147825Z if out == self.unknown_value: 2025-12-04T10:11:57.9148135Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9148212Z Traceback (most recent call last): 2025-12-04T10:11:57.9148517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9148586Z method(*args, **kwargs) 2025-12-04T10:11:57.9148876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9148976Z method(*args, **kwargs) 2025-12-04T10:11:57.9149268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9149328Z with policy(): 2025-12-04T10:11:57.9149621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9149684Z raise RuntimeError(msg) 2025-12-04T10:11:57.9150505Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9150510Z 2025-12-04T10:11:57.9150648Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9151173Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9151178Z 2025-12-04T10:11:57.9151358Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9151488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9151585Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9151974Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9152103Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9152165Z graph_break [] 2025-12-04T10:11:57.9152289Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9153026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9153102Z if out == self.unknown_value: 2025-12-04T10:11:57.9153230Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9153327Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9153457Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9153810Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9153870Z graph_break [] 2025-12-04T10:11:57.9153953Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9154252Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9154331Z Traceback (most recent call last): 2025-12-04T10:11:57.9154673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9154742Z method(*args, **kwargs) 2025-12-04T10:11:57.9155035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9155098Z method(*args, **kwargs) 2025-12-04T10:11:57.9155396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9155454Z with policy(): 2025-12-04T10:11:57.9155748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9155817Z raise RuntimeError(msg) 2025-12-04T10:11:57.9156680Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9156685Z 2025-12-04T10:11:57.9156820Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9157344Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9157347Z 2025-12-04T10:11:57.9157511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9157635Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9157727Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9158079Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9158205Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9158267Z graph_break [] 2025-12-04T10:11:57.9158389Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9159118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9159193Z if out == self.unknown_value: 2025-12-04T10:11:57.9159315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9159439Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9159566Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9159966Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9160028Z graph_break [] 2025-12-04T10:11:57.9160155Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9160244Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9160370Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9160711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9160771Z graph_break [] 2025-12-04T10:11:57.9161267Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml - 2025-12-04T10:11:57.9161370Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9162706Z FAILED [0.4048s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9162712Z 2025-12-04T10:11:57.9162840Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9163363Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9163428Z 2025-12-04T10:11:57.9163588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9163696Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9163819Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ================== 2025-12-04T10:11:57.9163878Z Got exit code 1 2025-12-04T10:11:57.9163945Z Retrying single test... 2025-12-04T10:11:57.9164210Z W1204 10:06:12.736000 62803 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9164599Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml 2025-12-04T10:11:57.9164700Z ============================= test session starts ============================== 2025-12-04T10:11:57.9164914Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9164989Z cachedir: .pytest_cache 2025-12-04T10:11:57.9165304Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9165381Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9165453Z configfile: pytest.ini 2025-12-04T10:11:57.9165773Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9165908Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9166519Z stepcurrent: skipping 50 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9166593Z Running 1 items in this shard 2025-12-04T10:11:57.9166629Z 2025-12-04T10:11:57.9167371Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:13.794462794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9167376Z 2025-12-04T10:11:57.9167676Z [W1204 10:06:23.019741268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9167679Z 2025-12-04T10:11:57.9167974Z [W1204 10:06:23.019977622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9167979Z 2025-12-04T10:11:57.9168265Z [W1204 10:06:23.025720799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9168268Z 2025-12-04T10:11:57.9168556Z [W1204 10:06:23.026256069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9168562Z 2025-12-04T10:11:57.9168882Z [W1204 10:06:23.026421983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9168886Z 2025-12-04T10:11:57.9169178Z [W1204 10:06:23.031703191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9169182Z 2025-12-04T10:11:57.9169466Z [W1204 10:06:23.032209951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9169469Z 2025-12-04T10:11:57.9169755Z [W1204 10:06:23.032408764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9169762Z 2025-12-04T10:11:57.9169842Z ('RERUN', {'yellow': True}) [11.0903s] [100%] 2025-12-04T10:11:57.9170565Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:24.206161433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9170602Z 2025-12-04T10:11:57.9170902Z [W1204 10:06:24.206684962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9170906Z 2025-12-04T10:11:57.9171196Z [W1204 10:06:24.206825775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9171200Z 2025-12-04T10:11:57.9171490Z [W1204 10:06:24.209765580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9171493Z 2025-12-04T10:11:57.9171782Z [W1204 10:06:24.210333101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9171789Z 2025-12-04T10:11:57.9172076Z [W1204 10:06:24.210475553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9172080Z 2025-12-04T10:11:57.9172365Z [W1204 10:06:24.214955626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9172369Z 2025-12-04T10:11:57.9172658Z [W1204 10:06:24.215403904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9172661Z 2025-12-04T10:11:57.9172984Z [W1204 10:06:24.215541057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9172987Z 2025-12-04T10:11:57.9173068Z ('RERUN', {'yellow': True}) [0.4188s] [100%] 2025-12-04T10:11:57.9173792Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:24.623284164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9173831Z 2025-12-04T10:11:57.9174118Z [W1204 10:06:24.623800884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9174121Z 2025-12-04T10:11:57.9174408Z [W1204 10:06:24.623944016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9174412Z 2025-12-04T10:11:57.9174699Z [W1204 10:06:24.626813530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9174702Z 2025-12-04T10:11:57.9174988Z [W1204 10:06:24.627353890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9174994Z 2025-12-04T10:11:57.9175279Z [W1204 10:06:24.627495742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9175315Z 2025-12-04T10:11:57.9175603Z [W1204 10:06:24.631968036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9175606Z 2025-12-04T10:11:57.9175891Z [W1204 10:06:24.632428425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9175894Z 2025-12-04T10:11:57.9176180Z [W1204 10:06:24.632570247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9176188Z 2025-12-04T10:11:57.9176259Z FAILED [0.4142s] [100%] 2025-12-04T10:11:57.9176262Z 2025-12-04T10:11:57.9176347Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9176685Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9176759Z Traceback (most recent call last): 2025-12-04T10:11:57.9177071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9177141Z method(*args, **kwargs) 2025-12-04T10:11:57.9177433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9177497Z method(*args, **kwargs) 2025-12-04T10:11:57.9177787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9177845Z with policy(): 2025-12-04T10:11:57.9178150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9178219Z raise RuntimeError(msg) 2025-12-04T10:11:57.9179024Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9179028Z 2025-12-04T10:11:57.9179159Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9179717Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9179725Z 2025-12-04T10:11:57.9179885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9180011Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9180148Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9180497Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9180624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9180695Z graph_break [] 2025-12-04T10:11:57.9180822Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9181521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9181590Z if out == self.unknown_value: 2025-12-04T10:11:57.9181878Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9181958Z Traceback (most recent call last): 2025-12-04T10:11:57.9182255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9182359Z method(*args, **kwargs) 2025-12-04T10:11:57.9182654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9182716Z method(*args, **kwargs) 2025-12-04T10:11:57.9183009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9183067Z with policy(): 2025-12-04T10:11:57.9183360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9183430Z raise RuntimeError(msg) 2025-12-04T10:11:57.9184243Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9184284Z 2025-12-04T10:11:57.9184417Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9184937Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9184941Z 2025-12-04T10:11:57.9185101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9185224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9185315Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9185669Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9185801Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9185858Z graph_break [] 2025-12-04T10:11:57.9185984Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9186671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9186742Z if out == self.unknown_value: 2025-12-04T10:11:57.9186898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9186989Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9187115Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9187567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9187630Z graph_break [] 2025-12-04T10:11:57.9187724Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9188016Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9188093Z Traceback (most recent call last): 2025-12-04T10:11:57.9188388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9188453Z method(*args, **kwargs) 2025-12-04T10:11:57.9188745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9188806Z method(*args, **kwargs) 2025-12-04T10:11:57.9189098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9189158Z with policy(): 2025-12-04T10:11:57.9189510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9189583Z raise RuntimeError(msg) 2025-12-04T10:11:57.9190396Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9190401Z 2025-12-04T10:11:57.9190529Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9191049Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9191088Z 2025-12-04T10:11:57.9191247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9191371Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9191460Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9191800Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9191920Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9191978Z graph_break [] 2025-12-04T10:11:57.9192115Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9192801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9192876Z if out == self.unknown_value: 2025-12-04T10:11:57.9192997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9193085Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9193211Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9193549Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9193611Z graph_break [] 2025-12-04T10:11:57.9193765Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9193853Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9193981Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9194358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9194417Z graph_break [] 2025-12-04T10:11:57.9194909Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml - 2025-12-04T10:11:57.9195007Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9196297Z FAILED [0.4142s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9196304Z 2025-12-04T10:11:57.9196427Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9196983Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9196988Z 2025-12-04T10:11:57.9197143Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9197246Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9197372Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.9197430Z Got exit code 1 2025-12-04T10:11:57.9197907Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9198183Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9198445Z W1204 10:06:31.240000 62996 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9198836Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml 2025-12-04T10:11:57.9198928Z ============================= test session starts ============================== 2025-12-04T10:11:57.9199137Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9199204Z cachedir: .pytest_cache 2025-12-04T10:11:57.9199509Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9199587Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9199653Z configfile: pytest.ini 2025-12-04T10:11:57.9200007Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9200152Z collecting ... collected 58 items / 51 deselected / 7 selected 2025-12-04T10:11:57.9200241Z stepcurrent: skipping 51 already run items. 2025-12-04T10:11:57.9200312Z Running 7 items in this shard 2025-12-04T10:11:57.9200316Z 2025-12-04T10:11:57.9200815Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9319s] [ 14%] 2025-12-04T10:11:57.9201334Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5388s] [ 14%] 2025-12-04T10:11:57.9201780Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5597s] [ 14%] 2025-12-04T10:11:57.9201818Z 2025-12-04T10:11:57.9201903Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9202204Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9202280Z Traceback (most recent call last): 2025-12-04T10:11:57.9202589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9202653Z method(*args, **kwargs) 2025-12-04T10:11:57.9202947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9203015Z method(*args, **kwargs) 2025-12-04T10:11:57.9203304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9203367Z with policy(): 2025-12-04T10:11:57.9203670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9203734Z raise RuntimeError(msg) 2025-12-04T10:11:57.9204571Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9204576Z 2025-12-04T10:11:57.9204701Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9205220Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9205228Z 2025-12-04T10:11:57.9205385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9205546Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9205644Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9206192Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9206324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9206381Z graph_break [] 2025-12-04T10:11:57.9206669Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9206747Z Traceback (most recent call last): 2025-12-04T10:11:57.9207050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9207114Z method(*args, **kwargs) 2025-12-04T10:11:57.9207407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9207470Z method(*args, **kwargs) 2025-12-04T10:11:57.9207761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9207818Z with policy(): 2025-12-04T10:11:57.9208111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9208178Z raise RuntimeError(msg) 2025-12-04T10:11:57.9209022Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9209060Z 2025-12-04T10:11:57.9209186Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9209706Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9209709Z 2025-12-04T10:11:57.9209864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9209989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9210082Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9210629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9210754Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9210810Z graph_break [] 2025-12-04T10:11:57.9210938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9211060Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9211180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9211719Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9211779Z graph_break [] 2025-12-04T10:11:57.9211862Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9212146Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9212255Z Traceback (most recent call last): 2025-12-04T10:11:57.9212552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9212615Z method(*args, **kwargs) 2025-12-04T10:11:57.9212917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9212980Z method(*args, **kwargs) 2025-12-04T10:11:57.9213269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9213338Z with policy(): 2025-12-04T10:11:57.9213639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9213704Z raise RuntimeError(msg) 2025-12-04T10:11:57.9214519Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9214526Z 2025-12-04T10:11:57.9214649Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9215167Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9215170Z 2025-12-04T10:11:57.9215326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9215489Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9215579Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9216117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9216296Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9216355Z graph_break [] 2025-12-04T10:11:57.9216478Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9216566Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9216684Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9217386Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9217447Z graph_break [] 2025-12-04T10:11:57.9217566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9217657Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9217775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9218382Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9218441Z graph_break [] 2025-12-04T10:11:57.9218929Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml - 2025-12-04T10:11:57.9219033Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9220320Z FAILED [0.5597s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9220379Z 2025-12-04T10:11:57.9220503Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9221018Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9221022Z 2025-12-04T10:11:57.9221182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9221287Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9221409Z ================== 1 failed, 51 deselected, 2 rerun in 3.05s =================== 2025-12-04T10:11:57.9221474Z Got exit code 1 2025-12-04T10:11:57.9221542Z Retrying single test... 2025-12-04T10:11:57.9221807Z W1204 10:06:40.897000 63185 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9222191Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml 2025-12-04T10:11:57.9222285Z ============================= test session starts ============================== 2025-12-04T10:11:57.9222494Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9222609Z cachedir: .pytest_cache 2025-12-04T10:11:57.9222918Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9222993Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9223102Z configfile: pytest.ini 2025-12-04T10:11:57.9223422Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9223551Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9224125Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9224197Z Running 1 items in this shard 2025-12-04T10:11:57.9224200Z 2025-12-04T10:11:57.9224936Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:06:42.482388401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9224940Z 2025-12-04T10:11:57.9225239Z [W1204 10:06:51.540583224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9225245Z 2025-12-04T10:11:57.9225572Z [W1204 10:06:51.540848839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9225576Z 2025-12-04T10:11:57.9225864Z [W1204 10:06:51.546980614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9225867Z 2025-12-04T10:11:57.9226154Z [W1204 10:06:51.547564244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9226158Z 2025-12-04T10:11:57.9226442Z [W1204 10:06:51.547747038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9226446Z 2025-12-04T10:11:57.9226739Z [W1204 10:06:51.553126270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9226782Z 2025-12-04T10:11:57.9227073Z [W1204 10:06:51.553669109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9227077Z 2025-12-04T10:11:57.9227363Z [W1204 10:06:51.553828662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9227367Z 2025-12-04T10:11:57.9227451Z ('RERUN', {'yellow': True}) [11.0030s] [100%] 2025-12-04T10:11:57.9228184Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:06:52.362854420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9228188Z 2025-12-04T10:11:57.9228478Z [W1204 10:06:52.363377209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9228484Z 2025-12-04T10:11:57.9228770Z [W1204 10:06:52.363517842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9228774Z 2025-12-04T10:11:57.9229065Z [W1204 10:06:52.366490043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9229068Z 2025-12-04T10:11:57.9229351Z [W1204 10:06:52.366955681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9229355Z 2025-12-04T10:11:57.9229676Z [W1204 10:06:52.367092183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9229683Z 2025-12-04T10:11:57.9229969Z [W1204 10:06:52.371801745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9230006Z 2025-12-04T10:11:57.9230292Z [W1204 10:06:52.372269282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9230296Z 2025-12-04T10:11:57.9230585Z [W1204 10:06:52.372418445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9230588Z 2025-12-04T10:11:57.9230666Z ('RERUN', {'yellow': True}) [0.5032s] [100%] 2025-12-04T10:11:57.9231393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:06:52.862613014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9231397Z 2025-12-04T10:11:57.9231682Z [W1204 10:06:52.863139553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9231688Z 2025-12-04T10:11:57.9231978Z [W1204 10:06:52.863278995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9232013Z 2025-12-04T10:11:57.9232300Z [W1204 10:06:52.866259027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9232303Z 2025-12-04T10:11:57.9232590Z [W1204 10:06:52.866716695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9232594Z 2025-12-04T10:11:57.9232881Z [W1204 10:06:52.866855777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9232885Z 2025-12-04T10:11:57.9233171Z [W1204 10:06:52.871576988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9233211Z 2025-12-04T10:11:57.9233496Z [W1204 10:06:52.872053757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9233500Z 2025-12-04T10:11:57.9233788Z [W1204 10:06:52.872191459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9233791Z 2025-12-04T10:11:57.9233854Z FAILED [0.4999s] [100%] 2025-12-04T10:11:57.9233857Z 2025-12-04T10:11:57.9233938Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9234231Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9234304Z Traceback (most recent call last): 2025-12-04T10:11:57.9234608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9234677Z method(*args, **kwargs) 2025-12-04T10:11:57.9234968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9235031Z method(*args, **kwargs) 2025-12-04T10:11:57.9235323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9235381Z with policy(): 2025-12-04T10:11:57.9235674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9235739Z raise RuntimeError(msg) 2025-12-04T10:11:57.9236576Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9236617Z 2025-12-04T10:11:57.9236746Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9237268Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9237272Z 2025-12-04T10:11:57.9237437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9237562Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9237658Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9238212Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9238340Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9238401Z graph_break [] 2025-12-04T10:11:57.9238523Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9239248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9239319Z if out == self.unknown_value: 2025-12-04T10:11:57.9239608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9239684Z Traceback (most recent call last): 2025-12-04T10:11:57.9240027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9240091Z method(*args, **kwargs) 2025-12-04T10:11:57.9240387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9240487Z method(*args, **kwargs) 2025-12-04T10:11:57.9240780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9240837Z with policy(): 2025-12-04T10:11:57.9241129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9241198Z raise RuntimeError(msg) 2025-12-04T10:11:57.9242011Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9242015Z 2025-12-04T10:11:57.9242142Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9242664Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9242668Z 2025-12-04T10:11:57.9242824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9242952Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9243044Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9243640Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9243766Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9243857Z graph_break [] 2025-12-04T10:11:57.9243985Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9244677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9244749Z if out == self.unknown_value: 2025-12-04T10:11:57.9244866Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9244954Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9245079Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9245617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9245680Z graph_break [] 2025-12-04T10:11:57.9245761Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9246082Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9246159Z Traceback (most recent call last): 2025-12-04T10:11:57.9246461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9246524Z method(*args, **kwargs) 2025-12-04T10:11:57.9246819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9246879Z method(*args, **kwargs) 2025-12-04T10:11:57.9247174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9247230Z with policy(): 2025-12-04T10:11:57.9247524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9247626Z raise RuntimeError(msg) 2025-12-04T10:11:57.9248444Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9248448Z 2025-12-04T10:11:57.9248573Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9249090Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9249094Z 2025-12-04T10:11:57.9249250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9249375Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9249465Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9250005Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9250128Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9250197Z graph_break [] 2025-12-04T10:11:57.9250356Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9251042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9251148Z if out == self.unknown_value: 2025-12-04T10:11:57.9251270Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9251358Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9251484Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9252021Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9252082Z graph_break [] 2025-12-04T10:11:57.9252203Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9252289Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9252409Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9252942Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9253005Z graph_break [] 2025-12-04T10:11:57.9253522Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml - 2025-12-04T10:11:57.9253623Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9254923Z FAILED [0.4999s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9254963Z 2025-12-04T10:11:57.9255088Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9255610Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9255614Z 2025-12-04T10:11:57.9255771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9255877Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9255995Z ================== 1 failed, 57 deselected, 2 rerun in 12.03s ================== 2025-12-04T10:11:57.9256051Z Got exit code 1 2025-12-04T10:11:57.9256116Z Retrying single test... 2025-12-04T10:11:57.9256379Z W1204 10:06:59.467000 63379 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9256765Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml 2025-12-04T10:11:57.9256861Z ============================= test session starts ============================== 2025-12-04T10:11:57.9257063Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9257128Z cachedir: .pytest_cache 2025-12-04T10:11:57.9257428Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9257503Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9257605Z configfile: pytest.ini 2025-12-04T10:11:57.9257919Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9258043Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9258651Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9258722Z Running 1 items in this shard 2025-12-04T10:11:57.9258726Z 2025-12-04T10:11:57.9259456Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:07:01.053437328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9259459Z 2025-12-04T10:11:57.9259756Z [W1204 10:07:10.171037329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9259760Z 2025-12-04T10:11:57.9260048Z [W1204 10:07:10.171293284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9260054Z 2025-12-04T10:11:57.9260338Z [W1204 10:07:10.177385679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9260380Z 2025-12-04T10:11:57.9260671Z [W1204 10:07:10.177993179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9260674Z 2025-12-04T10:11:57.9260961Z [W1204 10:07:10.178175352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9260965Z 2025-12-04T10:11:57.9261253Z [W1204 10:07:10.183636186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9261256Z 2025-12-04T10:11:57.9261541Z [W1204 10:07:10.184182616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9261579Z 2025-12-04T10:11:57.9261864Z [W1204 10:07:10.184362609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9261867Z 2025-12-04T10:11:57.9261952Z ('RERUN', {'yellow': True}) [11.0608s] [100%] 2025-12-04T10:11:57.9262676Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:07:11.984501376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9262680Z 2025-12-04T10:11:57.9262974Z [W1204 10:07:11.985012075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9262977Z 2025-12-04T10:11:57.9263268Z [W1204 10:07:11.985157687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9263274Z 2025-12-04T10:11:57.9263563Z [W1204 10:07:11.988012786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9263568Z 2025-12-04T10:11:57.9263856Z [W1204 10:07:11.988468864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9263859Z 2025-12-04T10:11:57.9264146Z [W1204 10:07:11.988607256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9264149Z 2025-12-04T10:11:57.9264467Z [W1204 10:07:11.993064913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9264470Z 2025-12-04T10:11:57.9264759Z [W1204 10:07:11.993516891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9264798Z 2025-12-04T10:11:57.9265087Z [W1204 10:07:11.993652863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9265090Z 2025-12-04T10:11:57.9265169Z ('RERUN', {'yellow': True}) [0.4940s] [100%] 2025-12-04T10:11:57.9265887Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:07:11.475107606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9265890Z 2025-12-04T10:11:57.9266176Z [W1204 10:07:11.475619505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9266179Z 2025-12-04T10:11:57.9266467Z [W1204 10:07:11.475764247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9266473Z 2025-12-04T10:11:57.9266757Z [W1204 10:07:11.478611526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9266760Z 2025-12-04T10:11:57.9267085Z [W1204 10:07:11.479061044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9267088Z 2025-12-04T10:11:57.9267375Z [W1204 10:07:11.479201167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9267378Z 2025-12-04T10:11:57.9267672Z [W1204 10:07:11.483654383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9267675Z 2025-12-04T10:11:57.9267961Z [W1204 10:07:11.484113961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9268026Z 2025-12-04T10:11:57.9268315Z [W1204 10:07:11.484251294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9268324Z 2025-12-04T10:11:57.9268385Z FAILED [0.4922s] [100%] 2025-12-04T10:11:57.9268389Z 2025-12-04T10:11:57.9268470Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9268764Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9268835Z Traceback (most recent call last): 2025-12-04T10:11:57.9269148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9269215Z method(*args, **kwargs) 2025-12-04T10:11:57.9269508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9269574Z method(*args, **kwargs) 2025-12-04T10:11:57.9269863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9269922Z with policy(): 2025-12-04T10:11:57.9270220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9270283Z raise RuntimeError(msg) 2025-12-04T10:11:57.9271124Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9271131Z 2025-12-04T10:11:57.9271262Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9271783Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9271821Z 2025-12-04T10:11:57.9271982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9272115Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9272215Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9272762Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9272889Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9272948Z graph_break [] 2025-12-04T10:11:57.9273072Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9273773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9273878Z if out == self.unknown_value: 2025-12-04T10:11:57.9274169Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9274244Z Traceback (most recent call last): 2025-12-04T10:11:57.9274538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9274603Z method(*args, **kwargs) 2025-12-04T10:11:57.9274892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9274952Z method(*args, **kwargs) 2025-12-04T10:11:57.9275244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9275340Z with policy(): 2025-12-04T10:11:57.9275636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9275702Z raise RuntimeError(msg) 2025-12-04T10:11:57.9276511Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9276517Z 2025-12-04T10:11:57.9276642Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9277160Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9277167Z 2025-12-04T10:11:57.9277326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9277449Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9277539Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9278082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9278240Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9278297Z graph_break [] 2025-12-04T10:11:57.9278420Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9279106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9279211Z if out == self.unknown_value: 2025-12-04T10:11:57.9279330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9279420Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9279543Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9280130Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9280191Z graph_break [] 2025-12-04T10:11:57.9280272Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9280558Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9280639Z Traceback (most recent call last): 2025-12-04T10:11:57.9280972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9281037Z method(*args, **kwargs) 2025-12-04T10:11:57.9281327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9281388Z method(*args, **kwargs) 2025-12-04T10:11:57.9281679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9281737Z with policy(): 2025-12-04T10:11:57.9282030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9282096Z raise RuntimeError(msg) 2025-12-04T10:11:57.9282911Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9282951Z 2025-12-04T10:11:57.9283076Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9283593Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9283597Z 2025-12-04T10:11:57.9283753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9283874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9283976Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9284521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9284647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9284705Z graph_break [] 2025-12-04T10:11:57.9284824Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9285542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9285614Z if out == self.unknown_value: 2025-12-04T10:11:57.9285733Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9285821Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9285977Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9286520Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9286579Z graph_break [] 2025-12-04T10:11:57.9286699Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9286785Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9286908Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9287440Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9287501Z graph_break [] 2025-12-04T10:11:57.9287990Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml - 2025-12-04T10:11:57.9288124Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9289415Z FAILED [0.4922s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9289419Z 2025-12-04T10:11:57.9289542Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9290062Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9290099Z 2025-12-04T10:11:57.9290253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9290357Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9290472Z ================== 1 failed, 57 deselected, 2 rerun in 12.07s ================== 2025-12-04T10:11:57.9290529Z Got exit code 1 2025-12-04T10:11:57.9291008Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9291250Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9291512Z W1204 10:07:18.111000 63572 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9291902Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml 2025-12-04T10:11:57.9291996Z ============================= test session starts ============================== 2025-12-04T10:11:57.9292204Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9292269Z cachedir: .pytest_cache 2025-12-04T10:11:57.9292573Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9292682Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9292749Z configfile: pytest.ini 2025-12-04T10:11:57.9293073Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9293237Z collecting ... collected 58 items / 52 deselected / 6 selected 2025-12-04T10:11:57.9293326Z stepcurrent: skipping 52 already run items. 2025-12-04T10:11:57.9293397Z Running 6 items in this shard 2025-12-04T10:11:57.9293401Z 2025-12-04T10:11:57.9293902Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8971s] [ 16%] 2025-12-04T10:11:57.9294396Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4827s] [ 16%] 2025-12-04T10:11:57.9294839Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4744s] [ 16%] 2025-12-04T10:11:57.9294842Z 2025-12-04T10:11:57.9294926Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9295219Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9295291Z Traceback (most recent call last): 2025-12-04T10:11:57.9295651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9295716Z method(*args, **kwargs) 2025-12-04T10:11:57.9296010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9296079Z method(*args, **kwargs) 2025-12-04T10:11:57.9296366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9296428Z with policy(): 2025-12-04T10:11:57.9296721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9296824Z raise RuntimeError(msg) 2025-12-04T10:11:57.9297642Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9297647Z 2025-12-04T10:11:57.9297773Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9298306Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9298310Z 2025-12-04T10:11:57.9298466Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9298592Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9298689Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9299040Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9299169Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9299225Z graph_break [] 2025-12-04T10:11:57.9299726Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9299808Z Traceback (most recent call last): 2025-12-04T10:11:57.9300247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9300319Z method(*args, **kwargs) 2025-12-04T10:11:57.9300612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9300712Z method(*args, **kwargs) 2025-12-04T10:11:57.9301003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9301070Z with policy(): 2025-12-04T10:11:57.9301368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9301436Z raise RuntimeError(msg) 2025-12-04T10:11:57.9302259Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9302263Z 2025-12-04T10:11:57.9302398Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9302919Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9302925Z 2025-12-04T10:11:57.9303121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9303252Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9303345Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9303691Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9303819Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9303876Z graph_break [] 2025-12-04T10:11:57.9304001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9304087Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9304247Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9304589Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9304647Z graph_break [] 2025-12-04T10:11:57.9304743Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9305038Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9305115Z Traceback (most recent call last): 2025-12-04T10:11:57.9305415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9305478Z method(*args, **kwargs) 2025-12-04T10:11:57.9305772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9305837Z method(*args, **kwargs) 2025-12-04T10:11:57.9306125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9306186Z with policy(): 2025-12-04T10:11:57.9306480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9306550Z raise RuntimeError(msg) 2025-12-04T10:11:57.9307402Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9307407Z 2025-12-04T10:11:57.9307536Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9308059Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9308107Z 2025-12-04T10:11:57.9308269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9308399Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9308488Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9308826Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9308954Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9309010Z graph_break [] 2025-12-04T10:11:57.9309136Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9309222Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9309344Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9309718Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9309775Z graph_break [] 2025-12-04T10:11:57.9309898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9309984Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9310103Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9310441Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9310497Z graph_break [] 2025-12-04T10:11:57.9310983Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml - 2025-12-04T10:11:57.9311124Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9312418Z FAILED [0.4744s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9312424Z 2025-12-04T10:11:57.9312550Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9313069Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9313076Z 2025-12-04T10:11:57.9313234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9313346Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9313464Z ================== 1 failed, 52 deselected, 2 rerun in 2.88s =================== 2025-12-04T10:11:57.9313523Z Got exit code 1 2025-12-04T10:11:57.9313587Z Retrying single test... 2025-12-04T10:11:57.9313855Z W1204 10:07:27.817000 63760 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9314273Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml 2025-12-04T10:11:57.9314370Z ============================= test session starts ============================== 2025-12-04T10:11:57.9314578Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9314683Z cachedir: .pytest_cache 2025-12-04T10:11:57.9314994Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9315070Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9315136Z configfile: pytest.ini 2025-12-04T10:11:57.9315454Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9315585Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9316155Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9316229Z Running 1 items in this shard 2025-12-04T10:11:57.9316233Z 2025-12-04T10:11:57.9317184Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:28.906628038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9317192Z 2025-12-04T10:11:57.9317515Z [W1204 10:07:38.945596298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9317519Z 2025-12-04T10:11:57.9317815Z [W1204 10:07:38.945849963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9317819Z 2025-12-04T10:11:57.9318124Z [W1204 10:07:38.951664283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9318127Z 2025-12-04T10:11:57.9318415Z [W1204 10:07:38.952206972 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9318484Z 2025-12-04T10:11:57.9318780Z [W1204 10:07:38.952392835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9318784Z 2025-12-04T10:11:57.9319071Z [W1204 10:07:38.957888760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9319075Z 2025-12-04T10:11:57.9319359Z [W1204 10:07:38.958406729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9319365Z 2025-12-04T10:11:57.9319655Z [W1204 10:07:38.958572142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9319658Z 2025-12-04T10:11:57.9319740Z ('RERUN', {'yellow': True}) [10.9365s] [100%] 2025-12-04T10:11:57.9320571Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:39.173261433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9320578Z 2025-12-04T10:11:57.9320870Z [W1204 10:07:39.173790392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9320873Z 2025-12-04T10:11:57.9321162Z [W1204 10:07:39.173936125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9321166Z 2025-12-04T10:11:57.9321506Z [W1204 10:07:39.176935076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9321510Z 2025-12-04T10:11:57.9321804Z [W1204 10:07:39.177499366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9321872Z 2025-12-04T10:11:57.9322160Z [W1204 10:07:39.177638219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9322165Z 2025-12-04T10:11:57.9322453Z [W1204 10:07:39.182331929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9322456Z 2025-12-04T10:11:57.9322741Z [W1204 10:07:39.182798837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9322744Z 2025-12-04T10:11:57.9323029Z [W1204 10:07:39.182937510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9323035Z 2025-12-04T10:11:57.9323115Z ('RERUN', {'yellow': True}) [0.4570s] [100%] 2025-12-04T10:11:57.9323843Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:39.627971816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9323898Z 2025-12-04T10:11:57.9324191Z [W1204 10:07:39.628511676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9324194Z 2025-12-04T10:11:57.9324478Z [W1204 10:07:39.628652668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9324481Z 2025-12-04T10:11:57.9324772Z [W1204 10:07:39.631660340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9324775Z 2025-12-04T10:11:57.9325059Z [W1204 10:07:39.632221800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9325098Z 2025-12-04T10:11:57.9325386Z [W1204 10:07:39.632368882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9325389Z 2025-12-04T10:11:57.9325676Z [W1204 10:07:39.637012922 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9325679Z 2025-12-04T10:11:57.9325967Z [W1204 10:07:39.637481570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9325970Z 2025-12-04T10:11:57.9326256Z [W1204 10:07:39.637618813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9326259Z 2025-12-04T10:11:57.9326329Z FAILED [0.4524s] [100%] 2025-12-04T10:11:57.9326333Z 2025-12-04T10:11:57.9326420Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9326722Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9326798Z Traceback (most recent call last): 2025-12-04T10:11:57.9327112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9327176Z method(*args, **kwargs) 2025-12-04T10:11:57.9327475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9327539Z method(*args, **kwargs) 2025-12-04T10:11:57.9327874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9327938Z with policy(): 2025-12-04T10:11:57.9328231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9328336Z raise RuntimeError(msg) 2025-12-04T10:11:57.9329145Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9329149Z 2025-12-04T10:11:57.9329281Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9329807Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9329810Z 2025-12-04T10:11:57.9329968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9330101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9330198Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9330587Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9330726Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9330787Z graph_break [] 2025-12-04T10:11:57.9330917Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9331613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9331684Z if out == self.unknown_value: 2025-12-04T10:11:57.9331978Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9332087Z Traceback (most recent call last): 2025-12-04T10:11:57.9332391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9332454Z method(*args, **kwargs) 2025-12-04T10:11:57.9332748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9332813Z method(*args, **kwargs) 2025-12-04T10:11:57.9333104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9333167Z with policy(): 2025-12-04T10:11:57.9333462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9333527Z raise RuntimeError(msg) 2025-12-04T10:11:57.9334351Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9334358Z 2025-12-04T10:11:57.9334485Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9335011Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9335014Z 2025-12-04T10:11:57.9335204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9335329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9335425Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9335768Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9335933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9335991Z graph_break [] 2025-12-04T10:11:57.9336115Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9336811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9336881Z if out == self.unknown_value: 2025-12-04T10:11:57.9337006Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9337096Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9337220Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9337567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9337627Z graph_break [] 2025-12-04T10:11:57.9337743Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9338041Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9338120Z Traceback (most recent call last): 2025-12-04T10:11:57.9338422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9338486Z method(*args, **kwargs) 2025-12-04T10:11:57.9338781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9338850Z method(*args, **kwargs) 2025-12-04T10:11:57.9339141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9339232Z with policy(): 2025-12-04T10:11:57.9339530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9339594Z raise RuntimeError(msg) 2025-12-04T10:11:57.9340419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9340424Z 2025-12-04T10:11:57.9340547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9341076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9341082Z 2025-12-04T10:11:57.9341237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9341359Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9341451Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9341791Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9341921Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9342019Z graph_break [] 2025-12-04T10:11:57.9342144Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9342830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9342934Z if out == self.unknown_value: 2025-12-04T10:11:57.9343055Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9343144Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9343261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9343603Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9343661Z graph_break [] 2025-12-04T10:11:57.9343782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9343875Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9343993Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9344336Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9344394Z graph_break [] 2025-12-04T10:11:57.9344911Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml - 2025-12-04T10:11:57.9345015Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9346320Z FAILED [0.4524s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9346359Z 2025-12-04T10:11:57.9346483Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9347004Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9347008Z 2025-12-04T10:11:57.9347164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9347267Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9347380Z ================== 1 failed, 57 deselected, 2 rerun in 11.87s ================== 2025-12-04T10:11:57.9347443Z Got exit code 1 2025-12-04T10:11:57.9347506Z Retrying single test... 2025-12-04T10:11:57.9347770Z W1204 10:07:46.224000 63953 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9348163Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml 2025-12-04T10:11:57.9348263Z ============================= test session starts ============================== 2025-12-04T10:11:57.9348470Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9348534Z cachedir: .pytest_cache 2025-12-04T10:11:57.9348840Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9348921Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9348987Z configfile: pytest.ini 2025-12-04T10:11:57.9349370Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9349509Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9350123Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9350198Z Running 1 items in this shard 2025-12-04T10:11:57.9350202Z 2025-12-04T10:11:57.9350937Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:47.319603078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9350941Z 2025-12-04T10:11:57.9351244Z [W1204 10:07:56.477468724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9351248Z 2025-12-04T10:11:57.9351539Z [W1204 10:07:56.477722819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9351546Z 2025-12-04T10:11:57.9351834Z [W1204 10:07:56.483524839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9351837Z 2025-12-04T10:11:57.9352156Z [W1204 10:07:56.484111629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9352160Z 2025-12-04T10:11:57.9352450Z [W1204 10:07:56.484273472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9352457Z 2025-12-04T10:11:57.9352747Z [W1204 10:07:56.489737056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9352750Z 2025-12-04T10:11:57.9353032Z [W1204 10:07:56.490289755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9353070Z 2025-12-04T10:11:57.9353359Z [W1204 10:07:56.490455908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9353362Z 2025-12-04T10:11:57.9353445Z ('RERUN', {'yellow': True}) [11.0591s] [100%] 2025-12-04T10:11:57.9354177Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:57.703441072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9354181Z 2025-12-04T10:11:57.9354469Z [W1204 10:07:57.703979682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9354472Z 2025-12-04T10:11:57.9354758Z [W1204 10:07:57.704127564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9354763Z 2025-12-04T10:11:57.9355051Z [W1204 10:07:57.707133417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9355054Z 2025-12-04T10:11:57.9355345Z [W1204 10:07:57.707704536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9355348Z 2025-12-04T10:11:57.9355637Z [W1204 10:07:57.707844749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9355640Z 2025-12-04T10:11:57.9355967Z [W1204 10:07:57.712448249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9355975Z 2025-12-04T10:11:57.9356264Z [W1204 10:07:57.712920207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9356300Z 2025-12-04T10:11:57.9356590Z [W1204 10:07:57.713059949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9356594Z 2025-12-04T10:11:57.9356678Z ('RERUN', {'yellow': True}) [0.4528s] [100%] 2025-12-04T10:11:57.9357403Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:58.153173195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9357406Z 2025-12-04T10:11:57.9357692Z [W1204 10:07:58.153695885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9357696Z 2025-12-04T10:11:57.9357981Z [W1204 10:07:58.153835957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9357986Z 2025-12-04T10:11:57.9358287Z [W1204 10:07:58.156777539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9358290Z 2025-12-04T10:11:57.9358612Z [W1204 10:07:58.157337568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9358615Z 2025-12-04T10:11:57.9358900Z [W1204 10:07:58.157479601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9358906Z 2025-12-04T10:11:57.9359193Z [W1204 10:07:58.162035471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9359196Z 2025-12-04T10:11:57.9359483Z [W1204 10:07:58.162493299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9359487Z 2025-12-04T10:11:57.9359776Z [W1204 10:07:58.162630621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9359816Z 2025-12-04T10:11:57.9359934Z FAILED [0.4479s] [100%] 2025-12-04T10:11:57.9359938Z 2025-12-04T10:11:57.9360028Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9360329Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9360405Z Traceback (most recent call last): 2025-12-04T10:11:57.9360721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9360787Z method(*args, **kwargs) 2025-12-04T10:11:57.9361084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9361149Z method(*args, **kwargs) 2025-12-04T10:11:57.9361441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9361508Z with policy(): 2025-12-04T10:11:57.9361811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9361876Z raise RuntimeError(msg) 2025-12-04T10:11:57.9362732Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9362737Z 2025-12-04T10:11:57.9362866Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9363394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9363432Z 2025-12-04T10:11:57.9363595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9363725Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9363819Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9364169Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9364301Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9364360Z graph_break [] 2025-12-04T10:11:57.9364488Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9365180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9365252Z if out == self.unknown_value: 2025-12-04T10:11:57.9365584Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9365658Z Traceback (most recent call last): 2025-12-04T10:11:57.9365956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9366020Z method(*args, **kwargs) 2025-12-04T10:11:57.9366316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9366380Z method(*args, **kwargs) 2025-12-04T10:11:57.9366670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9366731Z with policy(): 2025-12-04T10:11:57.9367067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9367135Z raise RuntimeError(msg) 2025-12-04T10:11:57.9367973Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9367977Z 2025-12-04T10:11:57.9368103Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9368628Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9368633Z 2025-12-04T10:11:57.9368792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9368915Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9369012Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9369359Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9369484Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9369544Z graph_break [] 2025-12-04T10:11:57.9369666Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9370620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9370725Z if out == self.unknown_value: 2025-12-04T10:11:57.9370848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9370941Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9371064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9371408Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9371465Z graph_break [] 2025-12-04T10:11:57.9371546Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9371840Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9371913Z Traceback (most recent call last): 2025-12-04T10:11:57.9372209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9372277Z method(*args, **kwargs) 2025-12-04T10:11:57.9372569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9372672Z method(*args, **kwargs) 2025-12-04T10:11:57.9372962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9373020Z with policy(): 2025-12-04T10:11:57.9373316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9373380Z raise RuntimeError(msg) 2025-12-04T10:11:57.9374204Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9374267Z 2025-12-04T10:11:57.9374397Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9374920Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9374929Z 2025-12-04T10:11:57.9375084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9375208Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9375302Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9375644Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9375767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9375830Z graph_break [] 2025-12-04T10:11:57.9375950Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9376639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9376707Z if out == self.unknown_value: 2025-12-04T10:11:57.9376828Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9376922Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9377077Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9377419Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9377512Z graph_break [] 2025-12-04T10:11:57.9377633Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9377725Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9377845Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9378181Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9378240Z graph_break [] 2025-12-04T10:11:57.9378725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml - 2025-12-04T10:11:57.9378828Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9380169Z FAILED [0.4479s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9380178Z 2025-12-04T10:11:57.9380312Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9380831Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9380835Z 2025-12-04T10:11:57.9380989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9381094Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9381209Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ================== 2025-12-04T10:11:57.9381308Z Got exit code 1 2025-12-04T10:11:57.9381781Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9382025Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9382289Z W1204 10:08:04.740000 64146 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9382675Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml 2025-12-04T10:11:57.9382773Z ============================= test session starts ============================== 2025-12-04T10:11:57.9382979Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9383047Z cachedir: .pytest_cache 2025-12-04T10:11:57.9383353Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9383429Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9383495Z configfile: pytest.ini 2025-12-04T10:11:57.9383813Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9383942Z collecting ... collected 58 items / 53 deselected / 5 selected 2025-12-04T10:11:57.9384030Z stepcurrent: skipping 53 already run items. 2025-12-04T10:11:57.9384133Z Running 5 items in this shard 2025-12-04T10:11:57.9384137Z 2025-12-04T10:11:57.9384633Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8357s] [ 20%] 2025-12-04T10:11:57.9385155Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4588s] [ 20%] 2025-12-04T10:11:57.9385602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4621s] [ 20%] 2025-12-04T10:11:57.9385607Z 2025-12-04T10:11:57.9385691Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9385984Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9386057Z Traceback (most recent call last): 2025-12-04T10:11:57.9386365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9386429Z method(*args, **kwargs) 2025-12-04T10:11:57.9386726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9386789Z method(*args, **kwargs) 2025-12-04T10:11:57.9387121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9387185Z with policy(): 2025-12-04T10:11:57.9387485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9387552Z raise RuntimeError(msg) 2025-12-04T10:11:57.9388355Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9388360Z 2025-12-04T10:11:57.9388486Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9389041Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9389045Z 2025-12-04T10:11:57.9389201Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9389330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9389424Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9389773Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9389900Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9389957Z graph_break [] 2025-12-04T10:11:57.9390249Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9390323Z Traceback (most recent call last): 2025-12-04T10:11:57.9390625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9390688Z method(*args, **kwargs) 2025-12-04T10:11:57.9390979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9391042Z method(*args, **kwargs) 2025-12-04T10:11:57.9391336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9391429Z with policy(): 2025-12-04T10:11:57.9391730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9391793Z raise RuntimeError(msg) 2025-12-04T10:11:57.9392638Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9392645Z 2025-12-04T10:11:57.9392767Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9393283Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9393287Z 2025-12-04T10:11:57.9393446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9393569Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9393664Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9394007Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9394167Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9394228Z graph_break [] 2025-12-04T10:11:57.9394348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9394436Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9394560Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9394900Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9394964Z graph_break [] 2025-12-04T10:11:57.9395054Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9395343Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9395455Z Traceback (most recent call last): 2025-12-04T10:11:57.9395754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9395816Z method(*args, **kwargs) 2025-12-04T10:11:57.9396116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9396176Z method(*args, **kwargs) 2025-12-04T10:11:57.9396470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9396533Z with policy(): 2025-12-04T10:11:57.9396824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9396892Z raise RuntimeError(msg) 2025-12-04T10:11:57.9397704Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9397710Z 2025-12-04T10:11:57.9397834Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9398347Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9398351Z 2025-12-04T10:11:57.9398540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9398661Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9398748Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9399125Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9399250Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9399305Z graph_break [] 2025-12-04T10:11:57.9399430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9399520Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9399645Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9400046Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9400105Z graph_break [] 2025-12-04T10:11:57.9400226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9400313Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9400433Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9400829Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9400887Z graph_break [] 2025-12-04T10:11:57.9401375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml - 2025-12-04T10:11:57.9401474Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9402769Z FAILED [0.4621s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9402813Z 2025-12-04T10:11:57.9402940Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9403455Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9403462Z 2025-12-04T10:11:57.9403617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9403721Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9403842Z ================== 1 failed, 53 deselected, 2 rerun in 2.78s =================== 2025-12-04T10:11:57.9403898Z Got exit code 1 2025-12-04T10:11:57.9403963Z Retrying single test... 2025-12-04T10:11:57.9404229Z W1204 10:08:14.379000 64327 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9404618Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml 2025-12-04T10:11:57.9404714Z ============================= test session starts ============================== 2025-12-04T10:11:57.9404920Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9404984Z cachedir: .pytest_cache 2025-12-04T10:11:57.9405331Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9405406Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9405469Z configfile: pytest.ini 2025-12-04T10:11:57.9405784Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9405948Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9406522Z stepcurrent: skipping 53 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9406591Z Running 1 items in this shard 2025-12-04T10:11:57.9406595Z 2025-12-04T10:11:57.9407319Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:15.440951911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9407326Z 2025-12-04T10:11:57.9407623Z [W1204 10:08:24.201539478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9407629Z 2025-12-04T10:11:57.9407918Z [W1204 10:08:24.201789443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9407922Z 2025-12-04T10:11:57.9408247Z [W1204 10:08:24.207577313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9408251Z 2025-12-04T10:11:57.9408538Z [W1204 10:08:24.208195434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9408541Z 2025-12-04T10:11:57.9408832Z [W1204 10:08:24.208389067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9408836Z 2025-12-04T10:11:57.9409122Z [W1204 10:08:24.213915003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9409126Z 2025-12-04T10:11:57.9409413Z [W1204 10:08:24.214451362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9409526Z 2025-12-04T10:11:57.9409813Z [W1204 10:08:24.214610445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9409817Z 2025-12-04T10:11:57.9409900Z ('RERUN', {'yellow': True}) [10.6268s] [100%] 2025-12-04T10:11:57.9410624Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:25.386902177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9410628Z 2025-12-04T10:11:57.9410914Z [W1204 10:08:25.387462466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9410922Z 2025-12-04T10:11:57.9411209Z [W1204 10:08:25.387607779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9411213Z 2025-12-04T10:11:57.9411500Z [W1204 10:08:25.390597100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9411504Z 2025-12-04T10:11:57.9411807Z [W1204 10:08:25.391184891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9411810Z 2025-12-04T10:11:57.9412134Z [W1204 10:08:25.391325313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9412138Z 2025-12-04T10:11:57.9412430Z [W1204 10:08:25.395850121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9412433Z 2025-12-04T10:11:57.9412754Z [W1204 10:08:25.396315579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9412759Z 2025-12-04T10:11:57.9413050Z [W1204 10:08:25.396453611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9413053Z 2025-12-04T10:11:57.9413131Z ('RERUN', {'yellow': True}) [0.4157s] [100%] 2025-12-04T10:11:57.9413849Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:25.801566644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9413856Z 2025-12-04T10:11:57.9414143Z [W1204 10:08:25.802139494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9414146Z 2025-12-04T10:11:57.9414433Z [W1204 10:08:25.802282327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9414438Z 2025-12-04T10:11:57.9414763Z [W1204 10:08:25.805195437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9414767Z 2025-12-04T10:11:57.9415052Z [W1204 10:08:25.805766467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9415055Z 2025-12-04T10:11:57.9415342Z [W1204 10:08:25.805906769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9415347Z 2025-12-04T10:11:57.9415632Z [W1204 10:08:25.810369686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9415635Z 2025-12-04T10:11:57.9415923Z [W1204 10:08:25.810829724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9415962Z 2025-12-04T10:11:57.9416250Z [W1204 10:08:25.810965426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9416253Z 2025-12-04T10:11:57.9416314Z FAILED [0.4102s] [100%] 2025-12-04T10:11:57.9416317Z 2025-12-04T10:11:57.9416401Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9416696Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9416778Z Traceback (most recent call last): 2025-12-04T10:11:57.9417264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9417334Z method(*args, **kwargs) 2025-12-04T10:11:57.9417637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9417705Z method(*args, **kwargs) 2025-12-04T10:11:57.9418002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9418061Z with policy(): 2025-12-04T10:11:57.9418356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9418425Z raise RuntimeError(msg) 2025-12-04T10:11:57.9419287Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9419292Z 2025-12-04T10:11:57.9419428Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9419998Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9420004Z 2025-12-04T10:11:57.9420167Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9420299Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9420396Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9420744Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9420872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9420939Z graph_break [] 2025-12-04T10:11:57.9421073Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9421818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9421895Z if out == self.unknown_value: 2025-12-04T10:11:57.9422188Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9422261Z Traceback (most recent call last): 2025-12-04T10:11:57.9422562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9422626Z method(*args, **kwargs) 2025-12-04T10:11:57.9422923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9422987Z method(*args, **kwargs) 2025-12-04T10:11:57.9423283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9423392Z with policy(): 2025-12-04T10:11:57.9423689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9423754Z raise RuntimeError(msg) 2025-12-04T10:11:57.9424568Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9424573Z 2025-12-04T10:11:57.9424702Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9425224Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9425231Z 2025-12-04T10:11:57.9425386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9425513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9425607Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9425950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9426081Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9426175Z graph_break [] 2025-12-04T10:11:57.9426300Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9427002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9427124Z if out == self.unknown_value: 2025-12-04T10:11:57.9427251Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9427342Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9427462Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9427810Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9427867Z graph_break [] 2025-12-04T10:11:57.9427956Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9428245Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9428319Z Traceback (most recent call last): 2025-12-04T10:11:57.9428627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9428690Z method(*args, **kwargs) 2025-12-04T10:11:57.9429016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9429082Z method(*args, **kwargs) 2025-12-04T10:11:57.9429371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9429434Z with policy(): 2025-12-04T10:11:57.9429730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9429794Z raise RuntimeError(msg) 2025-12-04T10:11:57.9430612Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9430652Z 2025-12-04T10:11:57.9430776Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9431304Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9431308Z 2025-12-04T10:11:57.9431467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9431597Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9431688Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9432024Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9432153Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9432210Z graph_break [] 2025-12-04T10:11:57.9432332Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9433022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9433089Z if out == self.unknown_value: 2025-12-04T10:11:57.9433252Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9433344Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9433463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9433805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9433900Z graph_break [] 2025-12-04T10:11:57.9434024Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9434113Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9434231Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9434573Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9434629Z graph_break [] 2025-12-04T10:11:57.9435116Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml - 2025-12-04T10:11:57.9435219Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9436542Z FAILED [0.4102s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9436552Z 2025-12-04T10:11:57.9436678Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9437195Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9437199Z 2025-12-04T10:11:57.9437355Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9437461Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9437609Z ================== 1 failed, 57 deselected, 2 rerun in 11.48s ================== 2025-12-04T10:11:57.9437671Z Got exit code 1 2025-12-04T10:11:57.9437735Z Retrying single test... 2025-12-04T10:11:57.9438000Z W1204 10:08:32.409000 64513 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9438388Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml 2025-12-04T10:11:57.9438482Z ============================= test session starts ============================== 2025-12-04T10:11:57.9438693Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9438759Z cachedir: .pytest_cache 2025-12-04T10:11:57.9439067Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9439144Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9439208Z configfile: pytest.ini 2025-12-04T10:11:57.9439526Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9439656Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9440276Z stepcurrent: skipping 53 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9440388Z Running 1 items in this shard 2025-12-04T10:11:57.9440391Z 2025-12-04T10:11:57.9441120Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:33.449508012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9441158Z 2025-12-04T10:11:57.9441464Z [W1204 10:08:42.526948746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9441468Z 2025-12-04T10:11:57.9441756Z [W1204 10:08:42.527199120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9441760Z 2025-12-04T10:11:57.9442052Z [W1204 10:08:42.533029290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9442055Z 2025-12-04T10:11:57.9442346Z [W1204 10:08:42.533625471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9442349Z 2025-12-04T10:11:57.9442637Z [W1204 10:08:42.533799394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9442643Z 2025-12-04T10:11:57.9442932Z [W1204 10:08:42.539234608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9442969Z 2025-12-04T10:11:57.9443260Z [W1204 10:08:42.539757906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9443264Z 2025-12-04T10:11:57.9443562Z [W1204 10:08:42.539914159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9443566Z 2025-12-04T10:11:57.9443648Z ('RERUN', {'yellow': True}) [10.9186s] [100%] 2025-12-04T10:11:57.9444371Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:43.704099851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9444414Z 2025-12-04T10:11:57.9444702Z [W1204 10:08:43.704664431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9444706Z 2025-12-04T10:11:57.9444996Z [W1204 10:08:43.704802854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9444999Z 2025-12-04T10:11:57.9445283Z [W1204 10:08:43.707666513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9445286Z 2025-12-04T10:11:57.9445583Z [W1204 10:08:43.708212622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9445586Z 2025-12-04T10:11:57.9445872Z [W1204 10:08:43.708358605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9445878Z 2025-12-04T10:11:57.9446168Z [W1204 10:08:43.712832662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9446172Z 2025-12-04T10:11:57.9446458Z [W1204 10:08:43.713287830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9446461Z 2025-12-04T10:11:57.9446745Z [W1204 10:08:43.713423143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9446751Z 2025-12-04T10:11:57.9446834Z ('RERUN', {'yellow': True}) [0.4129s] [100%] 2025-12-04T10:11:57.9447587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:44.115626285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9447624Z 2025-12-04T10:11:57.9447915Z [W1204 10:08:44.116180114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9447919Z 2025-12-04T10:11:57.9448206Z [W1204 10:08:44.116332177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9448209Z 2025-12-04T10:11:57.9448496Z [W1204 10:08:44.119211277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9448499Z 2025-12-04T10:11:57.9448786Z [W1204 10:08:44.119765796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9448789Z 2025-12-04T10:11:57.9449077Z [W1204 10:08:44.119903388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9449084Z 2025-12-04T10:11:57.9449372Z [W1204 10:08:44.124383315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9449375Z 2025-12-04T10:11:57.9449696Z [W1204 10:08:44.124833163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9449700Z 2025-12-04T10:11:57.9449986Z [W1204 10:08:44.124967326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9449989Z 2025-12-04T10:11:57.9450049Z FAILED [0.4051s] [100%] 2025-12-04T10:11:57.9450052Z 2025-12-04T10:11:57.9450140Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9450429Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9450509Z Traceback (most recent call last): 2025-12-04T10:11:57.9450854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9450921Z method(*args, **kwargs) 2025-12-04T10:11:57.9451221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9451283Z method(*args, **kwargs) 2025-12-04T10:11:57.9451572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9451633Z with policy(): 2025-12-04T10:11:57.9451933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9451999Z raise RuntimeError(msg) 2025-12-04T10:11:57.9452800Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9452807Z 2025-12-04T10:11:57.9452936Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9453455Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9453458Z 2025-12-04T10:11:57.9453615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9453796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9453892Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9454236Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9454410Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9454471Z graph_break [] 2025-12-04T10:11:57.9454599Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9455295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9455362Z if out == self.unknown_value: 2025-12-04T10:11:57.9455656Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9455729Z Traceback (most recent call last): 2025-12-04T10:11:57.9456029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9456095Z method(*args, **kwargs) 2025-12-04T10:11:57.9456392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9456457Z method(*args, **kwargs) 2025-12-04T10:11:57.9456783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9456847Z with policy(): 2025-12-04T10:11:57.9457141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9457205Z raise RuntimeError(msg) 2025-12-04T10:11:57.9458025Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9458064Z 2025-12-04T10:11:57.9458191Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9458708Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9458712Z 2025-12-04T10:11:57.9458866Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9458987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9459086Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9459432Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9459558Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9459616Z graph_break [] 2025-12-04T10:11:57.9459740Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9460433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9460500Z if out == self.unknown_value: 2025-12-04T10:11:57.9460628Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9460720Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9460879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9461224Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9461282Z graph_break [] 2025-12-04T10:11:57.9461407Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9461698Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9461770Z Traceback (most recent call last): 2025-12-04T10:11:57.9462075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9462136Z method(*args, **kwargs) 2025-12-04T10:11:57.9462430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9462495Z method(*args, **kwargs) 2025-12-04T10:11:57.9462783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9462843Z with policy(): 2025-12-04T10:11:57.9463143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9463222Z raise RuntimeError(msg) 2025-12-04T10:11:57.9464079Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9464083Z 2025-12-04T10:11:57.9464206Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9464725Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9464728Z 2025-12-04T10:11:57.9464881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9465003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9465134Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9465478Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9465604Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9465660Z graph_break [] 2025-12-04T10:11:57.9465781Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9466473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9466538Z if out == self.unknown_value: 2025-12-04T10:11:57.9466668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9466768Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9466887Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9467232Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9467290Z graph_break [] 2025-12-04T10:11:57.9467412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9467507Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9467628Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9468003Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9468063Z graph_break [] 2025-12-04T10:11:57.9468585Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml - 2025-12-04T10:11:57.9468689Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9469974Z FAILED [0.4051s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9469978Z 2025-12-04T10:11:57.9470106Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9470619Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9470626Z 2025-12-04T10:11:57.9470816Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9470920Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9471037Z ================== 1 failed, 57 deselected, 2 rerun in 11.76s ================== 2025-12-04T10:11:57.9471099Z Got exit code 1 2025-12-04T10:11:57.9471572Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9471816Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9472079Z W1204 10:08:50.658000 64699 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9472499Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml 2025-12-04T10:11:57.9472599Z ============================= test session starts ============================== 2025-12-04T10:11:57.9472804Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9472868Z cachedir: .pytest_cache 2025-12-04T10:11:57.9473174Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9473248Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9473318Z configfile: pytest.ini 2025-12-04T10:11:57.9473632Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9473758Z collecting ... collected 58 items / 54 deselected / 4 selected 2025-12-04T10:11:57.9473852Z stepcurrent: skipping 54 already run items. 2025-12-04T10:11:57.9473921Z Running 4 items in this shard 2025-12-04T10:11:57.9473924Z 2025-12-04T10:11:57.9474429Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8644s] [ 25%] 2025-12-04T10:11:57.9474914Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4495s] [ 25%] 2025-12-04T10:11:57.9475393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4478s] [ 25%] 2025-12-04T10:11:57.9475397Z 2025-12-04T10:11:57.9475481Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9475805Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9475884Z Traceback (most recent call last): 2025-12-04T10:11:57.9476190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9476252Z method(*args, **kwargs) 2025-12-04T10:11:57.9476552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9476613Z method(*args, **kwargs) 2025-12-04T10:11:57.9476910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9476968Z with policy(): 2025-12-04T10:11:57.9477265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9477343Z raise RuntimeError(msg) 2025-12-04T10:11:57.9478183Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9478187Z 2025-12-04T10:11:57.9478314Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9478828Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9478834Z 2025-12-04T10:11:57.9478989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9479117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9479207Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9479608Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9479735Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9479793Z graph_break [] 2025-12-04T10:11:57.9480136Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9480210Z Traceback (most recent call last): 2025-12-04T10:11:57.9480511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9480579Z method(*args, **kwargs) 2025-12-04T10:11:57.9480872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9480939Z method(*args, **kwargs) 2025-12-04T10:11:57.9481232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9481290Z with policy(): 2025-12-04T10:11:57.9481593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9481657Z raise RuntimeError(msg) 2025-12-04T10:11:57.9482512Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9482517Z 2025-12-04T10:11:57.9482643Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9483155Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9483201Z 2025-12-04T10:11:57.9483355Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9483480Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9483573Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9483918Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9484040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9484105Z graph_break [] 2025-12-04T10:11:57.9484226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9484318Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9484442Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9484786Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9484848Z graph_break [] 2025-12-04T10:11:57.9484969Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9485257Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9485333Z Traceback (most recent call last): 2025-12-04T10:11:57.9485627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9485695Z method(*args, **kwargs) 2025-12-04T10:11:57.9485989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9486049Z method(*args, **kwargs) 2025-12-04T10:11:57.9486345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9486438Z with policy(): 2025-12-04T10:11:57.9486734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9486804Z raise RuntimeError(msg) 2025-12-04T10:11:57.9487611Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9487615Z 2025-12-04T10:11:57.9487740Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9488250Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9488256Z 2025-12-04T10:11:57.9488415Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9488541Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9488630Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9488976Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9489098Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9489196Z graph_break [] 2025-12-04T10:11:57.9489319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9489407Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9489541Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9489918Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9489975Z graph_break [] 2025-12-04T10:11:57.9490100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9490185Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9490305Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9490643Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9490698Z graph_break [] 2025-12-04T10:11:57.9491184Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml - 2025-12-04T10:11:57.9491287Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9492602Z FAILED [0.4478s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9492606Z 2025-12-04T10:11:57.9492729Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9493251Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9493256Z 2025-12-04T10:11:57.9493449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9493551Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9493670Z ================== 1 failed, 54 deselected, 2 rerun in 2.79s =================== 2025-12-04T10:11:57.9493801Z Got exit code 1 2025-12-04T10:11:57.9493900Z Retrying single test... 2025-12-04T10:11:57.9494249Z W1204 10:09:00.275000 64880 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9494666Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml 2025-12-04T10:11:57.9494793Z ============================= test session starts ============================== 2025-12-04T10:11:57.9495267Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9499605Z cachedir: .pytest_cache 2025-12-04T10:11:57.9499978Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9500068Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9500141Z configfile: pytest.ini 2025-12-04T10:11:57.9500478Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9500624Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9501279Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9501361Z Running 1 items in this shard 2025-12-04T10:11:57.9501366Z 2025-12-04T10:11:57.9502104Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:01.577120863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9502148Z 2025-12-04T10:11:57.9502461Z [W1204 10:09:10.773823973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9502465Z 2025-12-04T10:11:57.9502755Z [W1204 10:09:10.774071868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9502759Z 2025-12-04T10:11:57.9503052Z [W1204 10:09:10.779838076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9503055Z 2025-12-04T10:11:57.9503341Z [W1204 10:09:10.780500227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9503345Z 2025-12-04T10:11:57.9503638Z [W1204 10:09:10.780677250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9503642Z 2025-12-04T10:11:57.9503963Z [W1204 10:09:10.786075073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9503967Z 2025-12-04T10:11:57.9504254Z [W1204 10:09:10.786595031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9504258Z 2025-12-04T10:11:57.9504549Z [W1204 10:09:10.786749924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9504552Z 2025-12-04T10:11:57.9504637Z ('RERUN', {'yellow': True}) [11.1185s] [100%] 2025-12-04T10:11:57.9505363Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:11.772564090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9505401Z 2025-12-04T10:11:57.9505691Z [W1204 10:09:11.773123570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9505694Z 2025-12-04T10:11:57.9505984Z [W1204 10:09:11.773268752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9505988Z 2025-12-04T10:11:57.9506278Z [W1204 10:09:11.776120391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9506281Z 2025-12-04T10:11:57.9506570Z [W1204 10:09:11.776682350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9506575Z 2025-12-04T10:11:57.9506860Z [W1204 10:09:11.776821333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9506865Z 2025-12-04T10:11:57.9507149Z [W1204 10:09:11.781256298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9507156Z 2025-12-04T10:11:57.9507440Z [W1204 10:09:11.781718706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9507443Z 2025-12-04T10:11:57.9507761Z [W1204 10:09:11.781853889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9507764Z 2025-12-04T10:11:57.9507845Z ('RERUN', {'yellow': True}) [0.4083s] [100%] 2025-12-04T10:11:57.9508564Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:12.178400151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9508602Z 2025-12-04T10:11:57.9508893Z [W1204 10:09:12.178970180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9508896Z 2025-12-04T10:11:57.9509182Z [W1204 10:09:12.179117183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9509185Z 2025-12-04T10:11:57.9509474Z [W1204 10:09:12.181988422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9509477Z 2025-12-04T10:11:57.9509760Z [W1204 10:09:12.182541141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9509763Z 2025-12-04T10:11:57.9510053Z [W1204 10:09:12.182683484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9510057Z 2025-12-04T10:11:57.9510381Z [W1204 10:09:12.187068759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9510384Z 2025-12-04T10:11:57.9510673Z [W1204 10:09:12.187517236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9510676Z 2025-12-04T10:11:57.9510961Z [W1204 10:09:12.187654959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9510967Z 2025-12-04T10:11:57.9511030Z FAILED [0.4037s] [100%] 2025-12-04T10:11:57.9511034Z 2025-12-04T10:11:57.9511125Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9511422Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9511546Z Traceback (most recent call last): 2025-12-04T10:11:57.9511867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9511932Z method(*args, **kwargs) 2025-12-04T10:11:57.9512229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9512290Z method(*args, **kwargs) 2025-12-04T10:11:57.9512577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9512642Z with policy(): 2025-12-04T10:11:57.9512947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9513016Z raise RuntimeError(msg) 2025-12-04T10:11:57.9513813Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9513819Z 2025-12-04T10:11:57.9513951Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9514478Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9514481Z 2025-12-04T10:11:57.9514682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9514821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9514917Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9515298Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9515433Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9515495Z graph_break [] 2025-12-04T10:11:57.9515625Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9516338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9516413Z if out == self.unknown_value: 2025-12-04T10:11:57.9516717Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9516796Z Traceback (most recent call last): 2025-12-04T10:11:57.9517313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9517388Z method(*args, **kwargs) 2025-12-04T10:11:57.9517759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9517828Z method(*args, **kwargs) 2025-12-04T10:11:57.9518117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9518176Z with policy(): 2025-12-04T10:11:57.9518473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9518539Z raise RuntimeError(msg) 2025-12-04T10:11:57.9519351Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9519411Z 2025-12-04T10:11:57.9519547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9520125Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9520131Z 2025-12-04T10:11:57.9520293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9520424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9520521Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9520873Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9521004Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9521069Z graph_break [] 2025-12-04T10:11:57.9521195Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9521891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9521964Z if out == self.unknown_value: 2025-12-04T10:11:57.9522086Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9523906Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9524067Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9524433Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9524582Z graph_break [] 2025-12-04T10:11:57.9524682Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9524997Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9525083Z Traceback (most recent call last): 2025-12-04T10:11:57.9525408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9525483Z method(*args, **kwargs) 2025-12-04T10:11:57.9525790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9525874Z method(*args, **kwargs) 2025-12-04T10:11:57.9526168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9526230Z with policy(): 2025-12-04T10:11:57.9526525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9526597Z raise RuntimeError(msg) 2025-12-04T10:11:57.9527457Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9527463Z 2025-12-04T10:11:57.9527603Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9528141Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9528146Z 2025-12-04T10:11:57.9528312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9528450Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9528550Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9528906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9529040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9529102Z graph_break [] 2025-12-04T10:11:57.9529235Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9529941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9530019Z if out == self.unknown_value: 2025-12-04T10:11:57.9530144Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9530243Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9530376Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9530723Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9530785Z graph_break [] 2025-12-04T10:11:57.9530908Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9531034Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9531219Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9531559Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9531655Z graph_break [] 2025-12-04T10:11:57.9532155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml - 2025-12-04T10:11:57.9532258Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9533549Z FAILED [0.4037s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9533556Z 2025-12-04T10:11:57.9533684Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9534211Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9534249Z 2025-12-04T10:11:57.9534408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9534513Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9534635Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ================== 2025-12-04T10:11:57.9534695Z Got exit code 1 2025-12-04T10:11:57.9534762Z Retrying single test... 2025-12-04T10:11:57.9535035Z W1204 10:09:18.784000 65066 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9535425Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml 2025-12-04T10:11:57.9535526Z ============================= test session starts ============================== 2025-12-04T10:11:57.9535739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9535809Z cachedir: .pytest_cache 2025-12-04T10:11:57.9536120Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9536197Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9536275Z configfile: pytest.ini 2025-12-04T10:11:57.9536592Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9536725Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9537296Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9537369Z Running 1 items in this shard 2025-12-04T10:11:57.9537372Z 2025-12-04T10:11:57.9538107Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:20.066412167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9538112Z 2025-12-04T10:11:57.9538411Z [W1204 10:09:29.070564571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9538416Z 2025-12-04T10:11:57.9538796Z [W1204 10:09:29.070818426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9538800Z 2025-12-04T10:11:57.9539088Z [W1204 10:09:29.077168165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9539127Z 2025-12-04T10:11:57.9539418Z [W1204 10:09:29.077761495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9539427Z 2025-12-04T10:11:57.9539714Z [W1204 10:09:29.077927928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9539717Z 2025-12-04T10:11:57.9540003Z [W1204 10:09:29.083409683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9540006Z 2025-12-04T10:11:57.9540312Z [W1204 10:09:29.083958322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9540316Z 2025-12-04T10:11:57.9540603Z [W1204 10:09:29.084116265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9540608Z 2025-12-04T10:11:57.9540693Z ('RERUN', {'yellow': True}) [10.9128s] [100%] 2025-12-04T10:11:57.9541450Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:30.079194891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9541455Z 2025-12-04T10:11:57.9541746Z [W1204 10:09:30.079761421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9541750Z 2025-12-04T10:11:57.9542039Z [W1204 10:09:30.079908963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9542042Z 2025-12-04T10:11:57.9542337Z [W1204 10:09:30.082839953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9542342Z 2025-12-04T10:11:57.9542628Z [W1204 10:09:30.083403853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9542633Z 2025-12-04T10:11:57.9542917Z [W1204 10:09:30.083546225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9542923Z 2025-12-04T10:11:57.9543207Z [W1204 10:09:30.087972051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9543211Z 2025-12-04T10:11:57.9543496Z [W1204 10:09:30.088434889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9543501Z 2025-12-04T10:11:57.9543791Z [W1204 10:09:30.088575941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9543795Z 2025-12-04T10:11:57.9543875Z ('RERUN', {'yellow': True}) [0.4185s] [100%] 2025-12-04T10:11:57.9544606Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:30.494506358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9544610Z 2025-12-04T10:11:57.9544904Z [W1204 10:09:30.495074797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9544908Z 2025-12-04T10:11:57.9545233Z [W1204 10:09:30.495224600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9545273Z 2025-12-04T10:11:57.9545571Z [W1204 10:09:30.498159521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9545607Z 2025-12-04T10:11:57.9545902Z [W1204 10:09:30.498728040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9545906Z 2025-12-04T10:11:57.9546200Z [W1204 10:09:30.498872043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9546203Z 2025-12-04T10:11:57.9546492Z [W1204 10:09:30.503430741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9546496Z 2025-12-04T10:11:57.9546787Z [W1204 10:09:30.503899300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9546792Z 2025-12-04T10:11:57.9547079Z [W1204 10:09:30.504038762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9547084Z 2025-12-04T10:11:57.9547151Z FAILED [0.4131s] [100%] 2025-12-04T10:11:57.9547154Z 2025-12-04T10:11:57.9547239Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9547572Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9547650Z Traceback (most recent call last): 2025-12-04T10:11:57.9547967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9548036Z method(*args, **kwargs) 2025-12-04T10:11:57.9548333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9548398Z method(*args, **kwargs) 2025-12-04T10:11:57.9548690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9548752Z with policy(): 2025-12-04T10:11:57.9549052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9549125Z raise RuntimeError(msg) 2025-12-04T10:11:57.9549922Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9549927Z 2025-12-04T10:11:57.9550061Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9550582Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9550586Z 2025-12-04T10:11:57.9550751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9550881Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9550980Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9551337Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9551466Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9551529Z graph_break [] 2025-12-04T10:11:57.9551654Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9552448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9552560Z if out == self.unknown_value: 2025-12-04T10:11:57.9552853Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9552930Z Traceback (most recent call last): 2025-12-04T10:11:57.9553235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9553300Z method(*args, **kwargs) 2025-12-04T10:11:57.9553601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9553663Z method(*args, **kwargs) 2025-12-04T10:11:57.9553954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9554020Z with policy(): 2025-12-04T10:11:57.9554314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9554386Z raise RuntimeError(msg) 2025-12-04T10:11:57.9555232Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9555237Z 2025-12-04T10:11:57.9555366Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9555888Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9555895Z 2025-12-04T10:11:57.9556052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9556180Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9556278Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9556629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9556755Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9556813Z graph_break [] 2025-12-04T10:11:57.9556940Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9557635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9557717Z if out == self.unknown_value: 2025-12-04T10:11:57.9557839Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9557932Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9558061Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9558405Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9558473Z graph_break [] 2025-12-04T10:11:57.9558564Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9558858Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9558935Z Traceback (most recent call last): 2025-12-04T10:11:57.9559317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9559384Z method(*args, **kwargs) 2025-12-04T10:11:57.9559683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9559782Z method(*args, **kwargs) 2025-12-04T10:11:57.9560199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9560265Z with policy(): 2025-12-04T10:11:57.9560572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9560641Z raise RuntimeError(msg) 2025-12-04T10:11:57.9561464Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9561470Z 2025-12-04T10:11:57.9561596Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9562122Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9562125Z 2025-12-04T10:11:57.9562324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9562457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9562550Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9562900Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9563031Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9563089Z graph_break [] 2025-12-04T10:11:57.9563217Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9563917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9563988Z if out == self.unknown_value: 2025-12-04T10:11:57.9564125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9564220Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9564346Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9564690Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9564750Z graph_break [] 2025-12-04T10:11:57.9564874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9564965Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9565086Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9565430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9565490Z graph_break [] 2025-12-04T10:11:57.9565982Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml - 2025-12-04T10:11:57.9566082Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9567415Z FAILED [0.4131s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9567488Z 2025-12-04T10:11:57.9567615Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9568141Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9568149Z 2025-12-04T10:11:57.9568307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9568412Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9568530Z ================== 1 failed, 57 deselected, 2 rerun in 11.77s ================== 2025-12-04T10:11:57.9568588Z Got exit code 1 2025-12-04T10:11:57.9569058Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9569351Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9569620Z W1204 10:09:37.084000 65252 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9570211Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml 2025-12-04T10:11:57.9570323Z ============================= test session starts ============================== 2025-12-04T10:11:57.9570540Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9570615Z cachedir: .pytest_cache 2025-12-04T10:11:57.9570923Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9571005Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9571071Z configfile: pytest.ini 2025-12-04T10:11:57.9571389Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9571522Z collecting ... collected 58 items / 55 deselected / 3 selected 2025-12-04T10:11:57.9571609Z stepcurrent: skipping 55 already run items. 2025-12-04T10:11:57.9571680Z Running 3 items in this shard 2025-12-04T10:11:57.9571684Z 2025-12-04T10:11:57.9572194Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8789s] [ 33%] 2025-12-04T10:11:57.9572687Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4977s] [ 33%] 2025-12-04T10:11:57.9573135Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5175s] [ 33%] 2025-12-04T10:11:57.9573141Z 2025-12-04T10:11:57.9573226Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9573524Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9573599Z Traceback (most recent call last): 2025-12-04T10:11:57.9573972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9574082Z method(*args, **kwargs) 2025-12-04T10:11:57.9574381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9574481Z method(*args, **kwargs) 2025-12-04T10:11:57.9574773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9574835Z with policy(): 2025-12-04T10:11:57.9575134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9575201Z raise RuntimeError(msg) 2025-12-04T10:11:57.9576008Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9576017Z 2025-12-04T10:11:57.9576148Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9576670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9576676Z 2025-12-04T10:11:57.9576838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9576999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9577099Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9577444Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9577576Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9577644Z graph_break [] 2025-12-04T10:11:57.9577944Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9578027Z Traceback (most recent call last): 2025-12-04T10:11:57.9578333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9578397Z method(*args, **kwargs) 2025-12-04T10:11:57.9578694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9578757Z method(*args, **kwargs) 2025-12-04T10:11:57.9579046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9579110Z with policy(): 2025-12-04T10:11:57.9579405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9579473Z raise RuntimeError(msg) 2025-12-04T10:11:57.9580296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9580301Z 2025-12-04T10:11:57.9580429Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9580958Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9580962Z 2025-12-04T10:11:57.9581121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9581303Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9581435Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9581780Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9581943Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9582001Z graph_break [] 2025-12-04T10:11:57.9582129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9582218Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9582339Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9582680Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9582738Z graph_break [] 2025-12-04T10:11:57.9582824Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9583121Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9583194Z Traceback (most recent call last): 2025-12-04T10:11:57.9583497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9583566Z method(*args, **kwargs) 2025-12-04T10:11:57.9583896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9583966Z method(*args, **kwargs) 2025-12-04T10:11:57.9584256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9584316Z with policy(): 2025-12-04T10:11:57.9584614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9584680Z raise RuntimeError(msg) 2025-12-04T10:11:57.9585508Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9585513Z 2025-12-04T10:11:57.9585649Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9586176Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9586180Z 2025-12-04T10:11:57.9586335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9586462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9586560Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9586905Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9587037Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9587094Z graph_break [] 2025-12-04T10:11:57.9587217Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9587309Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9587438Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9587781Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9587839Z graph_break [] 2025-12-04T10:11:57.9588040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9588127Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9588252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9588626Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9588684Z graph_break [] 2025-12-04T10:11:57.9589179Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml - 2025-12-04T10:11:57.9589280Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9590583Z FAILED [0.5175s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9590592Z 2025-12-04T10:11:57.9590715Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9591269Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9591278Z 2025-12-04T10:11:57.9591434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9591536Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9591657Z ================== 1 failed, 55 deselected, 2 rerun in 2.92s =================== 2025-12-04T10:11:57.9591717Z Got exit code 1 2025-12-04T10:11:57.9591781Z Retrying single test... 2025-12-04T10:11:57.9592045Z W1204 10:09:46.794000 65441 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9592431Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml 2025-12-04T10:11:57.9592531Z ============================= test session starts ============================== 2025-12-04T10:11:57.9592739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9592805Z cachedir: .pytest_cache 2025-12-04T10:11:57.9593109Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9593186Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9593253Z configfile: pytest.ini 2025-12-04T10:11:57.9593572Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9593712Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9594292Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9594364Z Running 1 items in this shard 2025-12-04T10:11:57.9594368Z 2025-12-04T10:11:57.9595099Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:09:47.892578085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9595109Z 2025-12-04T10:11:57.9595446Z [W1204 10:09:57.068279538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9595486Z 2025-12-04T10:11:57.9595779Z [W1204 10:09:57.068558242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9595815Z 2025-12-04T10:11:57.9596108Z [W1204 10:09:57.075012090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9596112Z 2025-12-04T10:11:57.9596401Z [W1204 10:09:57.075584810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9596405Z 2025-12-04T10:11:57.9596694Z [W1204 10:09:57.075754193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9596697Z 2025-12-04T10:11:57.9596984Z [W1204 10:09:57.081245885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9596989Z 2025-12-04T10:11:57.9597278Z [W1204 10:09:57.081785454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9597283Z 2025-12-04T10:11:57.9597571Z [W1204 10:09:57.081942927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9597574Z 2025-12-04T10:11:57.9597698Z ('RERUN', {'yellow': True}) [11.0769s] [100%] 2025-12-04T10:11:57.9598424Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:09:58.288243831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9598428Z 2025-12-04T10:11:57.9598717Z [W1204 10:09:58.288780290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9598725Z 2025-12-04T10:11:57.9599012Z [W1204 10:09:58.288919532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9599016Z 2025-12-04T10:11:57.9599301Z [W1204 10:09:58.291906082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9599304Z 2025-12-04T10:11:57.9599594Z [W1204 10:09:58.292480081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9599597Z 2025-12-04T10:11:57.9599957Z [W1204 10:09:58.292619214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9599961Z 2025-12-04T10:11:57.9600258Z [W1204 10:09:58.297128598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9600264Z 2025-12-04T10:11:57.9600550Z [W1204 10:09:58.297588596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9600556Z 2025-12-04T10:11:57.9600845Z [W1204 10:09:58.297724198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9600848Z 2025-12-04T10:11:57.9600928Z ('RERUN', {'yellow': True}) [0.4529s] [100%] 2025-12-04T10:11:57.9601651Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:09:58.738491187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9601657Z 2025-12-04T10:11:57.9601985Z [W1204 10:09:58.739009685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9602025Z 2025-12-04T10:11:57.9602313Z [W1204 10:09:58.739149818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9602349Z 2025-12-04T10:11:57.9602641Z [W1204 10:09:58.741994985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9602645Z 2025-12-04T10:11:57.9602933Z [W1204 10:09:58.742536234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9602936Z 2025-12-04T10:11:57.9603231Z [W1204 10:09:58.742674617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9603234Z 2025-12-04T10:11:57.9603526Z [W1204 10:09:58.747022159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9603530Z 2025-12-04T10:11:57.9603820Z [W1204 10:09:58.747467676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9603825Z 2025-12-04T10:11:57.9604117Z [W1204 10:09:58.747610408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9604120Z 2025-12-04T10:11:57.9604187Z FAILED [0.4481s] [100%] 2025-12-04T10:11:57.9604243Z 2025-12-04T10:11:57.9604331Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9604628Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9604706Z Traceback (most recent call last): 2025-12-04T10:11:57.9605014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9605084Z method(*args, **kwargs) 2025-12-04T10:11:57.9605384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9605450Z method(*args, **kwargs) 2025-12-04T10:11:57.9605748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9605808Z with policy(): 2025-12-04T10:11:57.9606103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9606171Z raise RuntimeError(msg) 2025-12-04T10:11:57.9606979Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9606984Z 2025-12-04T10:11:57.9607113Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9607636Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9607641Z 2025-12-04T10:11:57.9607805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9607934Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9608036Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9608397Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9608568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9608741Z graph_break [] 2025-12-04T10:11:57.9608870Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9609560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9609668Z if out == self.unknown_value: 2025-12-04T10:11:57.9609965Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9610041Z Traceback (most recent call last): 2025-12-04T10:11:57.9610345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9610409Z method(*args, **kwargs) 2025-12-04T10:11:57.9610709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9610773Z method(*args, **kwargs) 2025-12-04T10:11:57.9611065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9611129Z with policy(): 2025-12-04T10:11:57.9611428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9611497Z raise RuntimeError(msg) 2025-12-04T10:11:57.9612352Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9612356Z 2025-12-04T10:11:57.9612484Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9613010Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9613015Z 2025-12-04T10:11:57.9613172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9613300Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9613393Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9613741Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9613872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9613932Z graph_break [] 2025-12-04T10:11:57.9614057Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9614751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9614823Z if out == self.unknown_value: 2025-12-04T10:11:57.9614950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9615038Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9615164Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9615510Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9615569Z graph_break [] 2025-12-04T10:11:57.9615656Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9615987Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9616099Z Traceback (most recent call last): 2025-12-04T10:11:57.9616402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9616499Z method(*args, **kwargs) 2025-12-04T10:11:57.9616789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9616857Z method(*args, **kwargs) 2025-12-04T10:11:57.9617339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9617405Z with policy(): 2025-12-04T10:11:57.9617699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9617764Z raise RuntimeError(msg) 2025-12-04T10:11:57.9618616Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9618622Z 2025-12-04T10:11:57.9618748Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9619343Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9619347Z 2025-12-04T10:11:57.9619510Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9619638Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9619730Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9620072Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9620200Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9620260Z graph_break [] 2025-12-04T10:11:57.9620381Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9621078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9621149Z if out == self.unknown_value: 2025-12-04T10:11:57.9621277Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9621367Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9621492Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9621836Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9621897Z graph_break [] 2025-12-04T10:11:57.9622022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9622110Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9622232Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9622572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9622630Z graph_break [] 2025-12-04T10:11:57.9623171Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml - 2025-12-04T10:11:57.9623328Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9624632Z FAILED [0.4481s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9624686Z 2025-12-04T10:11:57.9624811Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9625330Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9625336Z 2025-12-04T10:11:57.9625493Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9625597Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9625719Z ================== 1 failed, 57 deselected, 2 rerun in 12.00s ================== 2025-12-04T10:11:57.9625777Z Got exit code 1 2025-12-04T10:11:57.9625843Z Retrying single test... 2025-12-04T10:11:57.9626143Z W1204 10:10:05.333000 65634 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9626528Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml 2025-12-04T10:11:57.9626622Z ============================= test session starts ============================== 2025-12-04T10:11:57.9626833Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9626900Z cachedir: .pytest_cache 2025-12-04T10:11:57.9627205Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9627283Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9627348Z configfile: pytest.ini 2025-12-04T10:11:57.9627667Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9627798Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9628379Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9628454Z Running 1 items in this shard 2025-12-04T10:11:57.9628457Z 2025-12-04T10:11:57.9629189Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:10:06.422887374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9629196Z 2025-12-04T10:11:57.9629509Z [W1204 10:10:15.582057755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9629513Z 2025-12-04T10:11:57.9629806Z [W1204 10:10:15.582312549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9629809Z 2025-12-04T10:11:57.9630101Z [W1204 10:10:15.588201890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9630104Z 2025-12-04T10:11:57.9630432Z [W1204 10:10:15.588792500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9630472Z 2025-12-04T10:11:57.9630765Z [W1204 10:10:15.588958443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9630822Z 2025-12-04T10:11:57.9631109Z [W1204 10:10:15.594491878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9631112Z 2025-12-04T10:11:57.9631405Z [W1204 10:10:15.595024677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9631408Z 2025-12-04T10:11:57.9631693Z [W1204 10:10:15.595182590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9631696Z 2025-12-04T10:11:57.9631778Z ('RERUN', {'yellow': True}) [11.0542s] [100%] 2025-12-04T10:11:57.9632510Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:10:16.806290806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9632516Z 2025-12-04T10:11:57.9632802Z [W1204 10:10:16.806832016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9632805Z 2025-12-04T10:11:57.9633130Z [W1204 10:10:16.806975998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9633134Z 2025-12-04T10:11:57.9633421Z [W1204 10:10:16.809872858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9633424Z 2025-12-04T10:11:57.9633725Z [W1204 10:10:16.810452908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9633730Z 2025-12-04T10:11:57.9634019Z [W1204 10:10:16.810596890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9634022Z 2025-12-04T10:11:57.9634317Z [W1204 10:10:16.815034216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9634320Z 2025-12-04T10:11:57.9634606Z [W1204 10:10:16.815487474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9634609Z 2025-12-04T10:11:57.9634894Z [W1204 10:10:16.815623937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9634900Z 2025-12-04T10:11:57.9634979Z ('RERUN', {'yellow': True}) [0.4506s] [100%] 2025-12-04T10:11:57.9635700Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:10:17.254901241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9635706Z 2025-12-04T10:11:57.9635998Z [W1204 10:10:17.255417410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9636001Z 2025-12-04T10:11:57.9636289Z [W1204 10:10:17.255560802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9636292Z 2025-12-04T10:11:57.9636584Z [W1204 10:10:17.258397141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9636587Z 2025-12-04T10:11:57.9636873Z [W1204 10:10:17.258932500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9636947Z 2025-12-04T10:11:57.9637239Z [W1204 10:10:17.259069612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9637243Z 2025-12-04T10:11:57.9637528Z [W1204 10:10:17.263492198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9637564Z 2025-12-04T10:11:57.9637864Z [W1204 10:10:17.263939276 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9637867Z 2025-12-04T10:11:57.9638153Z [W1204 10:10:17.264075038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9638157Z 2025-12-04T10:11:57.9638218Z FAILED [0.4456s] [100%] 2025-12-04T10:11:57.9638221Z 2025-12-04T10:11:57.9638320Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9638623Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9638700Z Traceback (most recent call last): 2025-12-04T10:11:57.9639013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9639079Z method(*args, **kwargs) 2025-12-04T10:11:57.9639415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9639478Z method(*args, **kwargs) 2025-12-04T10:11:57.9639773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9639833Z with policy(): 2025-12-04T10:11:57.9640177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9640253Z raise RuntimeError(msg) 2025-12-04T10:11:57.9641061Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9641067Z 2025-12-04T10:11:57.9641199Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9641723Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9641727Z 2025-12-04T10:11:57.9641886Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9642029Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9642131Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9642482Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9642610Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9642669Z graph_break [] 2025-12-04T10:11:57.9642796Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9643489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9643564Z if out == self.unknown_value: 2025-12-04T10:11:57.9643860Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9644023Z Traceback (most recent call last): 2025-12-04T10:11:57.9644326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9644390Z method(*args, **kwargs) 2025-12-04T10:11:57.9644716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9644783Z method(*args, **kwargs) 2025-12-04T10:11:57.9645073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9645135Z with policy(): 2025-12-04T10:11:57.9645428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9645494Z raise RuntimeError(msg) 2025-12-04T10:11:57.9646318Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9646324Z 2025-12-04T10:11:57.9646450Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9647009Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9647014Z 2025-12-04T10:11:57.9647172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9647297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9647394Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9647740Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9647872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9647929Z graph_break [] 2025-12-04T10:11:57.9648052Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9648745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9648815Z if out == self.unknown_value: 2025-12-04T10:11:57.9648939Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9649030Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9649152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9649499Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9649558Z graph_break [] 2025-12-04T10:11:57.9649643Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9649940Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:11:57.9650015Z Traceback (most recent call last): 2025-12-04T10:11:57.9650318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9650383Z method(*args, **kwargs) 2025-12-04T10:11:57.9650677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9650741Z method(*args, **kwargs) 2025-12-04T10:11:57.9651072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9651173Z with policy(): 2025-12-04T10:11:57.9651479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9651579Z raise RuntimeError(msg) 2025-12-04T10:11:57.9652402Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9652406Z 2025-12-04T10:11:57.9652532Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9653058Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9653063Z 2025-12-04T10:11:57.9653217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9653342Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9653436Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9653777Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9653936Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9653997Z graph_break [] 2025-12-04T10:11:57.9654121Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9654811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9654881Z if out == self.unknown_value: 2025-12-04T10:11:57.9655004Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9655094Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9655220Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9655565Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9655624Z graph_break [] 2025-12-04T10:11:57.9655744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9655837Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9655957Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9656303Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9656362Z graph_break [] 2025-12-04T10:11:57.9656850Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml - 2025-12-04T10:11:57.9656955Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9658415Z FAILED [0.4456s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9658480Z 2025-12-04T10:11:57.9658632Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9659255Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9659298Z 2025-12-04T10:11:57.9659489Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9659613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9659748Z ================== 1 failed, 57 deselected, 2 rerun in 11.97s ================== 2025-12-04T10:11:57.9659822Z Got exit code 1 2025-12-04T10:11:57.9660314Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:11:57.9660561Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9660836Z W1204 10:10:23.888000 65827 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9661228Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml 2025-12-04T10:11:57.9661327Z ============================= test session starts ============================== 2025-12-04T10:11:57.9661569Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9661637Z cachedir: .pytest_cache 2025-12-04T10:11:57.9661945Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9662021Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9662088Z configfile: pytest.ini 2025-12-04T10:11:57.9662405Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9662535Z collecting ... collected 58 items / 56 deselected / 2 selected 2025-12-04T10:11:57.9662626Z stepcurrent: skipping 56 already run items. 2025-12-04T10:11:57.9662697Z Running 2 items in this shard 2025-12-04T10:11:57.9662700Z 2025-12-04T10:11:57.9663199Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8403s] [ 50%] 2025-12-04T10:11:57.9663682Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4514s] [ 50%] 2025-12-04T10:11:57.9664126Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4392s] [ 50%] 2025-12-04T10:11:57.9664135Z 2025-12-04T10:11:57.9664218Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9664507Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9664587Z Traceback (most recent call last): 2025-12-04T10:11:57.9664896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9664972Z method(*args, **kwargs) 2025-12-04T10:11:57.9665274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9665339Z method(*args, **kwargs) 2025-12-04T10:11:57.9665630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9665729Z with policy(): 2025-12-04T10:11:57.9666059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9666129Z raise RuntimeError(msg) 2025-12-04T10:11:57.9666927Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9666964Z 2025-12-04T10:11:57.9667094Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9667629Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9667633Z 2025-12-04T10:11:57.9667821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9667977Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9668094Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9668508Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9668660Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9668769Z graph_break [] 2025-12-04T10:11:57.9669113Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9669199Z Traceback (most recent call last): 2025-12-04T10:11:57.9669544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9669608Z method(*args, **kwargs) 2025-12-04T10:11:57.9669904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9669971Z method(*args, **kwargs) 2025-12-04T10:11:57.9670259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9670319Z with policy(): 2025-12-04T10:11:57.9670619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9670686Z raise RuntimeError(msg) 2025-12-04T10:11:57.9671496Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9671500Z 2025-12-04T10:11:57.9671627Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9672144Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9672152Z 2025-12-04T10:11:57.9672308Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9672432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9672531Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9672876Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9673002Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9673064Z graph_break [] 2025-12-04T10:11:57.9673225Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9673350Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9673470Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9673845Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9673909Z graph_break [] 2025-12-04T10:11:57.9673993Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9674280Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9674358Z Traceback (most recent call last): 2025-12-04T10:11:57.9674657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9674724Z method(*args, **kwargs) 2025-12-04T10:11:57.9675014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9675088Z method(*args, **kwargs) 2025-12-04T10:11:57.9675384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9675446Z with policy(): 2025-12-04T10:11:57.9675778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9675846Z raise RuntimeError(msg) 2025-12-04T10:11:57.9676649Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9676652Z 2025-12-04T10:11:57.9676782Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9677296Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9677301Z 2025-12-04T10:11:57.9677458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9677589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9677683Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9678028Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9678151Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9678215Z graph_break [] 2025-12-04T10:11:57.9678339Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9678428Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9678555Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9678895Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9678953Z graph_break [] 2025-12-04T10:11:57.9679079Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9679168Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9679292Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9679633Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9679692Z graph_break [] 2025-12-04T10:11:57.9680302Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml - 2025-12-04T10:11:57.9680404Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9681719Z FAILED [0.4392s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9681723Z 2025-12-04T10:11:57.9681846Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9682371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9682377Z 2025-12-04T10:11:57.9682528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9682634Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9682755Z ================== 1 failed, 56 deselected, 2 rerun in 2.75s =================== 2025-12-04T10:11:57.9682847Z Got exit code 1 2025-12-04T10:11:57.9682917Z Retrying single test... 2025-12-04T10:11:57.9683181Z W1204 10:10:33.567000 66008 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9683569Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml 2025-12-04T10:11:57.9683669Z ============================= test session starts ============================== 2025-12-04T10:11:57.9683879Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9683944Z cachedir: .pytest_cache 2025-12-04T10:11:57.9684265Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9684342Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9684410Z configfile: pytest.ini 2025-12-04T10:11:57.9684732Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9684860Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9685429Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9685500Z Running 1 items in this shard 2025-12-04T10:11:57.9685504Z 2025-12-04T10:11:57.9686234Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:34.607733399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9686239Z 2025-12-04T10:11:57.9686539Z [W1204 10:10:43.607510043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9686542Z 2025-12-04T10:11:57.9686833Z [W1204 10:10:43.607762847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9686836Z 2025-12-04T10:11:57.9687126Z [W1204 10:10:43.613899272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9687216Z 2025-12-04T10:11:57.9687504Z [W1204 10:10:43.614487722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9687510Z 2025-12-04T10:11:57.9687798Z [W1204 10:10:43.614659345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9687837Z 2025-12-04T10:11:57.9688126Z [W1204 10:10:43.620094798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9688129Z 2025-12-04T10:11:57.9688421Z [W1204 10:10:43.620623937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9688424Z 2025-12-04T10:11:57.9688710Z [W1204 10:10:43.620778299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9688715Z 2025-12-04T10:11:57.9688800Z ('RERUN', {'yellow': True}) [10.8465s] [100%] 2025-12-04T10:11:57.9689525Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:44.789419187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9689531Z 2025-12-04T10:11:57.9689855Z [W1204 10:10:44.789979377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9689860Z 2025-12-04T10:11:57.9690148Z [W1204 10:10:44.790138480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9690151Z 2025-12-04T10:11:57.9690442Z [W1204 10:10:44.793032309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9690448Z 2025-12-04T10:11:57.9690733Z [W1204 10:10:44.793575788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9690737Z 2025-12-04T10:11:57.9691020Z [W1204 10:10:44.793717881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9691028Z 2025-12-04T10:11:57.9691316Z [W1204 10:10:44.798132966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9691319Z 2025-12-04T10:11:57.9691603Z [W1204 10:10:44.798583064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9691606Z 2025-12-04T10:11:57.9691897Z [W1204 10:10:44.798719316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9691900Z 2025-12-04T10:11:57.9691993Z ('RERUN', {'yellow': True}) [0.4057s] [100%] 2025-12-04T10:11:57.9692713Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:45.192145823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9692719Z 2025-12-04T10:11:57.9693008Z [W1204 10:10:45.192713263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9693011Z 2025-12-04T10:11:57.9693303Z [W1204 10:10:45.192853925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9693307Z 2025-12-04T10:11:57.9693592Z [W1204 10:10:45.195685723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9693595Z 2025-12-04T10:11:57.9693954Z [W1204 10:10:45.196222783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9693963Z 2025-12-04T10:11:57.9694249Z [W1204 10:10:45.196368615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9694285Z 2025-12-04T10:11:57.9694572Z [W1204 10:10:45.200741960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9694577Z 2025-12-04T10:11:57.9694868Z [W1204 10:10:45.201197297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9694871Z 2025-12-04T10:11:57.9695158Z [W1204 10:10:45.201335310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9695161Z 2025-12-04T10:11:57.9695227Z FAILED [0.4040s] [100%] 2025-12-04T10:11:57.9695232Z 2025-12-04T10:11:57.9695317Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9695612Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9695687Z Traceback (most recent call last): 2025-12-04T10:11:57.9695996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9696101Z method(*args, **kwargs) 2025-12-04T10:11:57.9696397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9696461Z method(*args, **kwargs) 2025-12-04T10:11:57.9696764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9696825Z with policy(): 2025-12-04T10:11:57.9697129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9697197Z raise RuntimeError(msg) 2025-12-04T10:11:57.9697989Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9697996Z 2025-12-04T10:11:57.9698127Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9698645Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9698649Z 2025-12-04T10:11:57.9698813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9698941Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9699039Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9699391Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9699520Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9699585Z graph_break [] 2025-12-04T10:11:57.9699710Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9700403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9700476Z if out == self.unknown_value: 2025-12-04T10:11:57.9700836Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9700914Z Traceback (most recent call last): 2025-12-04T10:11:57.9701214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9701311Z method(*args, **kwargs) 2025-12-04T10:11:57.9701609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9701672Z method(*args, **kwargs) 2025-12-04T10:11:57.9701963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9702026Z with policy(): 2025-12-04T10:11:57.9702322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9702390Z raise RuntimeError(msg) 2025-12-04T10:11:57.9703194Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9703199Z 2025-12-04T10:11:57.9703329Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9703880Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9703884Z 2025-12-04T10:11:57.9704042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9704172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9704267Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9704615Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9704741Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9704802Z graph_break [] 2025-12-04T10:11:57.9704930Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9705620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9705691Z if out == self.unknown_value: 2025-12-04T10:11:57.9705829Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9705920Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9706047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9706388Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9706448Z graph_break [] 2025-12-04T10:11:57.9706540Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9706830Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9706906Z Traceback (most recent call last): 2025-12-04T10:11:57.9707205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9707270Z method(*args, **kwargs) 2025-12-04T10:11:57.9707563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9707698Z method(*args, **kwargs) 2025-12-04T10:11:57.9707989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9708052Z with policy(): 2025-12-04T10:11:57.9708379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9708447Z raise RuntimeError(msg) 2025-12-04T10:11:57.9709257Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9709262Z 2025-12-04T10:11:57.9709386Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9709904Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9709910Z 2025-12-04T10:11:57.9710064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9710193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9710284Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9710681Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9710817Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9710876Z graph_break [] 2025-12-04T10:11:57.9711007Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9711697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9711770Z if out == self.unknown_value: 2025-12-04T10:11:57.9711900Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9711994Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9712122Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9712467Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9712526Z graph_break [] 2025-12-04T10:11:57.9712651Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9712742Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9712875Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9713226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9713287Z graph_break [] 2025-12-04T10:11:57.9713786Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml - 2025-12-04T10:11:57.9713909Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9715226Z FAILED [0.4040s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9715265Z 2025-12-04T10:11:57.9715395Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9715948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9715954Z 2025-12-04T10:11:57.9716115Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9716220Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9716337Z ================== 1 failed, 57 deselected, 2 rerun in 11.68s ================== 2025-12-04T10:11:57.9716396Z Got exit code 1 2025-12-04T10:11:57.9716462Z Retrying single test... 2025-12-04T10:11:57.9716731Z W1204 10:10:51.748000 66194 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9717286Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml 2025-12-04T10:11:57.9717392Z ============================= test session starts ============================== 2025-12-04T10:11:57.9717600Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9717665Z cachedir: .pytest_cache 2025-12-04T10:11:57.9718035Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9718126Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9718193Z configfile: pytest.ini 2025-12-04T10:11:57.9718514Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9718644Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9719216Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9719288Z Running 1 items in this shard 2025-12-04T10:11:57.9719292Z 2025-12-04T10:11:57.9720057Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:52.789383828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9720066Z 2025-12-04T10:11:57.9720366Z [W1204 10:11:01.674350718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9720370Z 2025-12-04T10:11:57.9720664Z [W1204 10:11:01.674608782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9720669Z 2025-12-04T10:11:57.9720962Z [W1204 10:11:01.680486553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9720967Z 2025-12-04T10:11:57.9721252Z [W1204 10:11:01.681068473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9721256Z 2025-12-04T10:11:57.9721558Z [W1204 10:11:01.681235415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9721562Z 2025-12-04T10:11:57.9721853Z [W1204 10:11:01.686641337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9721857Z 2025-12-04T10:11:57.9722275Z [W1204 10:11:01.687170496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9722324Z 2025-12-04T10:11:57.9722619Z [W1204 10:11:01.687329929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9722667Z 2025-12-04T10:11:57.9722753Z ('RERUN', {'yellow': True}) [10.7303s] [100%] 2025-12-04T10:11:57.9723501Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:11:02.855035512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9723505Z 2025-12-04T10:11:57.9723796Z [W1204 10:11:02.855598662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9723801Z 2025-12-04T10:11:57.9724095Z [W1204 10:11:02.855737964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9724100Z 2025-12-04T10:11:57.9724386Z [W1204 10:11:02.858602903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9724390Z 2025-12-04T10:11:57.9724683Z [W1204 10:11:02.859153273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9724686Z 2025-12-04T10:11:57.9725007Z [W1204 10:11:02.859290115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9725011Z 2025-12-04T10:11:57.9725304Z [W1204 10:11:02.863763531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9725308Z 2025-12-04T10:11:57.9725596Z [W1204 10:11:02.864217968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9725602Z 2025-12-04T10:11:57.9725890Z [W1204 10:11:02.864366681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9725894Z 2025-12-04T10:11:57.9725977Z ('RERUN', {'yellow': True}) [0.4101s] [100%] 2025-12-04T10:11:57.9726698Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:11:03.263802566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9726704Z 2025-12-04T10:11:57.9726993Z [W1204 10:11:03.264376506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9726996Z 2025-12-04T10:11:57.9727285Z [W1204 10:11:03.264515028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9727292Z 2025-12-04T10:11:57.9727577Z [W1204 10:11:03.267422457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9727581Z 2025-12-04T10:11:57.9727867Z [W1204 10:11:03.267975886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9727870Z 2025-12-04T10:11:57.9728162Z [W1204 10:11:03.268117339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9728165Z 2025-12-04T10:11:57.9728452Z [W1204 10:11:03.272800028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9728455Z 2025-12-04T10:11:57.9728787Z [W1204 10:11:03.273262356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9728823Z 2025-12-04T10:11:57.9729112Z [W1204 10:11:03.273423589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9729115Z 2025-12-04T10:11:57.9729214Z FAILED [0.4091s] [100%] 2025-12-04T10:11:57.9729217Z 2025-12-04T10:11:57.9729302Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9729600Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9729680Z Traceback (most recent call last): 2025-12-04T10:11:57.9729990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9730061Z method(*args, **kwargs) 2025-12-04T10:11:57.9730362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9730428Z method(*args, **kwargs) 2025-12-04T10:11:57.9730730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9730791Z with policy(): 2025-12-04T10:11:57.9731089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9731162Z raise RuntimeError(msg) 2025-12-04T10:11:57.9731993Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9731999Z 2025-12-04T10:11:57.9732132Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9732656Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9732661Z 2025-12-04T10:11:57.9732825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9732954Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9733047Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9733400Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9733528Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9733588Z graph_break [] 2025-12-04T10:11:57.9733714Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9734410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9734483Z if out == self.unknown_value: 2025-12-04T10:11:57.9734777Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9734850Z Traceback (most recent call last): 2025-12-04T10:11:57.9735159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9735225Z method(*args, **kwargs) 2025-12-04T10:11:57.9735526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9735590Z method(*args, **kwargs) 2025-12-04T10:11:57.9735917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9736012Z with policy(): 2025-12-04T10:11:57.9736309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9736426Z raise RuntimeError(msg) 2025-12-04T10:11:57.9737234Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9737238Z 2025-12-04T10:11:57.9737366Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9737901Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9737908Z 2025-12-04T10:11:57.9738067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9738196Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9738291Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9738641Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9738804Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9738865Z graph_break [] 2025-12-04T10:11:57.9738987Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9739680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9739763Z if out == self.unknown_value: 2025-12-04T10:11:57.9739894Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9739985Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9740110Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9740456Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9740515Z graph_break [] 2025-12-04T10:11:57.9740601Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9740891Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:11:57.9740965Z Traceback (most recent call last): 2025-12-04T10:11:57.9741268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9741334Z method(*args, **kwargs) 2025-12-04T10:11:57.9741631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9741696Z method(*args, **kwargs) 2025-12-04T10:11:57.9741986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9742051Z with policy(): 2025-12-04T10:11:57.9742347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9742411Z raise RuntimeError(msg) 2025-12-04T10:11:57.9743505Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9743549Z 2025-12-04T10:11:57.9743685Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9744211Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9744251Z 2025-12-04T10:11:57.9744412Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9744540Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9744633Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9744974Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9745105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9745164Z graph_break [] 2025-12-04T10:11:57.9745287Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9745982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9746053Z if out == self.unknown_value: 2025-12-04T10:11:57.9746211Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9746302Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9746425Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9746775Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9746835Z graph_break [] 2025-12-04T10:11:57.9746957Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9747046Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9747173Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9747509Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9747568Z graph_break [] 2025-12-04T10:11:57.9748053Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml - 2025-12-04T10:11:57.9748156Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9749445Z FAILED [0.4091s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9749455Z 2025-12-04T10:11:57.9749580Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9750099Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9750103Z 2025-12-04T10:11:57.9750260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9750400Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9750547Z ================== 1 failed, 57 deselected, 2 rerun in 11.57s ================== 2025-12-04T10:11:57.9750613Z Got exit code 1 2025-12-04T10:11:57.9751086Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:11:57.9751367Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9751631Z W1204 10:11:09.929000 66380 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9752017Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml 2025-12-04T10:11:57.9752120Z ============================= test session starts ============================== 2025-12-04T10:11:57.9752331Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9752401Z cachedir: .pytest_cache 2025-12-04T10:11:57.9752706Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9752783Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9752852Z configfile: pytest.ini 2025-12-04T10:11:57.9753170Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9753334Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9753423Z stepcurrent: skipping 57 already run items. 2025-12-04T10:11:57.9753492Z Running 1 items in this shard 2025-12-04T10:11:57.9753496Z 2025-12-04T10:11:57.9753992Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8676s] [100%] 2025-12-04T10:11:57.9754479Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4534s] [100%] 2025-12-04T10:11:57.9754937Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4397s] [100%] 2025-12-04T10:11:57.9754941Z 2025-12-04T10:11:57.9755028Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9755319Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9755393Z Traceback (most recent call last): 2025-12-04T10:11:57.9755701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9755771Z method(*args, **kwargs) 2025-12-04T10:11:57.9756068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9756132Z method(*args, **kwargs) 2025-12-04T10:11:57.9756427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9756488Z with policy(): 2025-12-04T10:11:57.9756785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9756853Z raise RuntimeError(msg) 2025-12-04T10:11:57.9757646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9757695Z 2025-12-04T10:11:57.9757860Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9758378Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9758415Z 2025-12-04T10:11:57.9758575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9758704Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9758797Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9759143Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9759270Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9759328Z graph_break [] 2025-12-04T10:11:57.9759622Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9759696Z Traceback (most recent call last): 2025-12-04T10:11:57.9760050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9760119Z method(*args, **kwargs) 2025-12-04T10:11:57.9760410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9760515Z method(*args, **kwargs) 2025-12-04T10:11:57.9760809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9760871Z with policy(): 2025-12-04T10:11:57.9761168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9761235Z raise RuntimeError(msg) 2025-12-04T10:11:57.9762044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9762049Z 2025-12-04T10:11:57.9762173Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9762694Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9762698Z 2025-12-04T10:11:57.9762856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9762981Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9763084Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9763432Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9763562Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9763622Z graph_break [] 2025-12-04T10:11:57.9763756Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9763851Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9763973Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9764313Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9764377Z graph_break [] 2025-12-04T10:11:57.9764461Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9764843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9764918Z Traceback (most recent call last): 2025-12-04T10:11:57.9765225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9765330Z method(*args, **kwargs) 2025-12-04T10:11:57.9765625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9765692Z method(*args, **kwargs) 2025-12-04T10:11:57.9765983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9766042Z with policy(): 2025-12-04T10:11:57.9766348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9766416Z raise RuntimeError(msg) 2025-12-04T10:11:57.9767231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9767240Z 2025-12-04T10:11:57.9767365Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9767921Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9767926Z 2025-12-04T10:11:57.9768087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9768213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9768308Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9768656Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9768782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9768844Z graph_break [] 2025-12-04T10:11:57.9768964Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9769055Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9769180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9769518Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9769583Z graph_break [] 2025-12-04T10:11:57.9769706Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9769801Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9769932Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9770273Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9770339Z graph_break [] 2025-12-04T10:11:57.9770831Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml - 2025-12-04T10:11:57.9770931Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9772253Z FAILED [0.4397s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9772320Z 2025-12-04T10:11:57.9772448Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9772972Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9772975Z 2025-12-04T10:11:57.9773132Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9773242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9773358Z ================== 1 failed, 57 deselected, 2 rerun in 2.78s =================== 2025-12-04T10:11:57.9773418Z Got exit code 1 2025-12-04T10:11:57.9773493Z Retrying single test... 2025-12-04T10:11:57.9773753Z W1204 10:11:19.570000 66561 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9774136Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml 2025-12-04T10:11:57.9774239Z ============================= test session starts ============================== 2025-12-04T10:11:57.9774480Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9774551Z cachedir: .pytest_cache 2025-12-04T10:11:57.9774865Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9774943Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9775012Z configfile: pytest.ini 2025-12-04T10:11:57.9775330Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9775457Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9776028Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9776099Z Running 1 items in this shard 2025-12-04T10:11:57.9776103Z 2025-12-04T10:11:57.9776842Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:20.847036940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9776846Z 2025-12-04T10:11:57.9777148Z [W1204 10:11:30.968059869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9777152Z 2025-12-04T10:11:57.9777447Z [W1204 10:11:30.968324473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9777450Z 2025-12-04T10:11:57.9777740Z [W1204 10:11:30.974270512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9777744Z 2025-12-04T10:11:57.9778037Z [W1204 10:11:30.974867302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9778041Z 2025-12-04T10:11:57.9778334Z [W1204 10:11:30.975038245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9778338Z 2025-12-04T10:11:57.9778630Z [W1204 10:11:30.980534746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9778703Z 2025-12-04T10:11:57.9778995Z [W1204 10:11:30.981076865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9778999Z 2025-12-04T10:11:57.9779285Z [W1204 10:11:30.981234888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9779321Z 2025-12-04T10:11:57.9779407Z ('RERUN', {'yellow': True}) [11.0205s] [100%] 2025-12-04T10:11:57.9780130Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:31.969446881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9780134Z 2025-12-04T10:11:57.9780427Z [W1204 10:11:31.970024290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9780434Z 2025-12-04T10:11:57.9780722Z [W1204 10:11:31.970167333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9780725Z 2025-12-04T10:11:57.9781014Z [W1204 10:11:31.973050541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9781019Z 2025-12-04T10:11:57.9781340Z [W1204 10:11:31.973594430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9781344Z 2025-12-04T10:11:57.9781640Z [W1204 10:11:31.973731442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9781643Z 2025-12-04T10:11:57.9781932Z [W1204 10:11:31.978129516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9781937Z 2025-12-04T10:11:57.9782226Z [W1204 10:11:31.978585654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9782233Z 2025-12-04T10:11:57.9782520Z [W1204 10:11:31.978721036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9782524Z 2025-12-04T10:11:57.9782609Z ('RERUN', {'yellow': True}) [0.4099s] [100%] 2025-12-04T10:11:57.9783332Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:31.377713587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9783336Z 2025-12-04T10:11:57.9783626Z [W1204 10:11:31.378277987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9783632Z 2025-12-04T10:11:57.9783923Z [W1204 10:11:31.378417399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9783926Z 2025-12-04T10:11:57.9784218Z [W1204 10:11:31.381398148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9784223Z 2025-12-04T10:11:57.9784515Z [W1204 10:11:31.381949317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9784519Z 2025-12-04T10:11:57.9784808Z [W1204 10:11:31.382088250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9784811Z 2025-12-04T10:11:57.9785106Z [W1204 10:11:31.386657796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9785109Z 2025-12-04T10:11:57.9785465Z [W1204 10:11:31.387117893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9785469Z 2025-12-04T10:11:57.9785765Z [W1204 10:11:31.387253945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9785804Z 2025-12-04T10:11:57.9785867Z FAILED [0.4068s] [100%] 2025-12-04T10:11:57.9785871Z 2025-12-04T10:11:57.9785963Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9786265Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9786339Z Traceback (most recent call last): 2025-12-04T10:11:57.9786645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9786718Z method(*args, **kwargs) 2025-12-04T10:11:57.9787017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9787084Z method(*args, **kwargs) 2025-12-04T10:11:57.9787375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9787436Z with policy(): 2025-12-04T10:11:57.9787735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9787839Z raise RuntimeError(msg) 2025-12-04T10:11:57.9788639Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9788651Z 2025-12-04T10:11:57.9788779Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9789299Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9789304Z 2025-12-04T10:11:57.9789472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9789600Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9789698Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9790048Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9790175Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9790239Z graph_break [] 2025-12-04T10:11:57.9790364Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9791067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9791139Z if out == self.unknown_value: 2025-12-04T10:11:57.9791428Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9791506Z Traceback (most recent call last): 2025-12-04T10:11:57.9791810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9791885Z method(*args, **kwargs) 2025-12-04T10:11:57.9792187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9792307Z method(*args, **kwargs) 2025-12-04T10:11:57.9792639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9792700Z with policy(): 2025-12-04T10:11:57.9792998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9793101Z raise RuntimeError(msg) 2025-12-04T10:11:57.9793908Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9793911Z 2025-12-04T10:11:57.9794046Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9794566Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9794571Z 2025-12-04T10:11:57.9794729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9794858Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9794952Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9795332Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9795460Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9795518Z graph_break [] 2025-12-04T10:11:57.9795646Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9796340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9796417Z if out == self.unknown_value: 2025-12-04T10:11:57.9796539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9796630Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9796758Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9797104Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9797167Z graph_break [] 2025-12-04T10:11:57.9797250Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9797540Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9797619Z Traceback (most recent call last): 2025-12-04T10:11:57.9797920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9797983Z method(*args, **kwargs) 2025-12-04T10:11:57.9798282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9798345Z method(*args, **kwargs) 2025-12-04T10:11:57.9798644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9798702Z with policy(): 2025-12-04T10:11:57.9798996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9799066Z raise RuntimeError(msg) 2025-12-04T10:11:57.9799957Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9799993Z 2025-12-04T10:11:57.9800163Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9800682Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9800686Z 2025-12-04T10:11:57.9800844Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9800970Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9801060Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9801414Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9801540Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9801597Z graph_break [] 2025-12-04T10:11:57.9801724Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9802451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9802527Z if out == self.unknown_value: 2025-12-04T10:11:57.9802649Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9802738Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9802863Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9803203Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9803264Z graph_break [] 2025-12-04T10:11:57.9803391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9803480Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9803608Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9803953Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9804013Z graph_break [] 2025-12-04T10:11:57.9804502Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml - 2025-12-04T10:11:57.9804604Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9805888Z FAILED [0.4068s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9805894Z 2025-12-04T10:11:57.9806018Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9806537Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9806541Z 2025-12-04T10:11:57.9806734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9806876Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9806995Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ================== 2025-12-04T10:11:57.9807088Z Got exit code 1 2025-12-04T10:11:57.9807154Z Retrying single test... 2025-12-04T10:11:57.9807436Z W1204 10:11:37.942000 66747 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9807829Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml 2025-12-04T10:11:57.9807928Z ============================= test session starts ============================== 2025-12-04T10:11:57.9808137Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9808204Z cachedir: .pytest_cache 2025-12-04T10:11:57.9808515Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9808592Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9808660Z configfile: pytest.ini 2025-12-04T10:11:57.9808977Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9809109Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:11:57.9809716Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9809787Z Running 1 items in this shard 2025-12-04T10:11:57.9809791Z 2025-12-04T10:11:57.9810528Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:39.203827166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9810533Z 2025-12-04T10:11:57.9810829Z [W1204 10:11:48.340366985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9810834Z 2025-12-04T10:11:57.9811128Z [W1204 10:11:48.340622279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9811132Z 2025-12-04T10:11:57.9811425Z [W1204 10:11:48.346423177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9811428Z 2025-12-04T10:11:57.9811717Z [W1204 10:11:48.347017087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9811724Z 2025-12-04T10:11:57.9812014Z [W1204 10:11:48.347198401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9812019Z 2025-12-04T10:11:57.9812307Z [W1204 10:11:48.352685324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9812312Z 2025-12-04T10:11:57.9812602Z [W1204 10:11:48.353223543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9812605Z 2025-12-04T10:11:57.9812896Z [W1204 10:11:48.353379635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9812899Z 2025-12-04T10:11:57.9812985Z ('RERUN', {'yellow': True}) [11.0243s] [100%] 2025-12-04T10:11:57.9813741Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:49.344970273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9813777Z 2025-12-04T10:11:57.9814073Z [W1204 10:11:49.345546802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9814109Z 2025-12-04T10:11:57.9814397Z [W1204 10:11:49.345691775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9814400Z 2025-12-04T10:11:57.9814694Z [W1204 10:11:49.348609944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9814697Z 2025-12-04T10:11:57.9814997Z [W1204 10:11:49.349168244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9815000Z 2025-12-04T10:11:57.9815293Z [W1204 10:11:49.349308886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9815298Z 2025-12-04T10:11:57.9815591Z [W1204 10:11:49.353808342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9815596Z 2025-12-04T10:11:57.9815882Z [W1204 10:11:49.354278690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9815885Z 2025-12-04T10:11:57.9816228Z [W1204 10:11:49.354416472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9816232Z 2025-12-04T10:11:57.9816312Z ('RERUN', {'yellow': True}) [0.4116s] [100%] 2025-12-04T10:11:57.9817185Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:49.753824159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9817191Z 2025-12-04T10:11:57.9817487Z [W1204 10:11:49.754391518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9817492Z 2025-12-04T10:11:57.9817787Z [W1204 10:11:49.754536861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9817790Z 2025-12-04T10:11:57.9818082Z [W1204 10:11:49.757472030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9818085Z 2025-12-04T10:11:57.9818378Z [W1204 10:11:49.758032250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9818385Z 2025-12-04T10:11:57.9818675Z [W1204 10:11:49.758172162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9818681Z 2025-12-04T10:11:57.9818972Z [W1204 10:11:49.762670949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9818977Z 2025-12-04T10:11:57.9819275Z [W1204 10:11:49.763136126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9819277Z 2025-12-04T10:11:57.9819565Z [W1204 10:11:49.763274289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:11:57.9819568Z 2025-12-04T10:11:57.9819631Z FAILED [0.4088s] [100%] 2025-12-04T10:11:57.9819634Z 2025-12-04T10:11:57.9819722Z ==================================== RERUNS ==================================== 2025-12-04T10:11:57.9820015Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9820206Z Traceback (most recent call last): 2025-12-04T10:11:57.9820521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9820597Z method(*args, **kwargs) 2025-12-04T10:11:57.9820948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9821012Z method(*args, **kwargs) 2025-12-04T10:11:57.9821311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9821371Z with policy(): 2025-12-04T10:11:57.9821670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9821737Z raise RuntimeError(msg) 2025-12-04T10:11:57.9822537Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912. 2025-12-04T10:11:57.9822543Z 2025-12-04T10:11:57.9822676Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9823246Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9823250Z 2025-12-04T10:11:57.9823418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9823548Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9823641Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9823996Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9824129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9824191Z graph_break [] 2025-12-04T10:11:57.9824316Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9825010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9825086Z if out == self.unknown_value: 2025-12-04T10:11:57.9825388Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9825467Z Traceback (most recent call last): 2025-12-04T10:11:57.9825769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9825836Z method(*args, **kwargs) 2025-12-04T10:11:57.9826131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9826194Z method(*args, **kwargs) 2025-12-04T10:11:57.9826488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9826550Z with policy(): 2025-12-04T10:11:57.9826847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9826915Z raise RuntimeError(msg) 2025-12-04T10:11:57.9827756Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064. 2025-12-04T10:11:57.9827793Z 2025-12-04T10:11:57.9827920Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9828453Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9828491Z 2025-12-04T10:11:57.9828650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9828784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9828879Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9829226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9829361Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9829422Z graph_break [] 2025-12-04T10:11:57.9829549Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9830241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9830311Z if out == self.unknown_value: 2025-12-04T10:11:57.9830539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9830631Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9830761Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9831105Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9831172Z graph_break [] 2025-12-04T10:11:57.9831261Z =================================== FAILURES =================================== 2025-12-04T10:11:57.9831552Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:11:57.9831627Z Traceback (most recent call last): 2025-12-04T10:11:57.9831930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9832004Z method(*args, **kwargs) 2025-12-04T10:11:57.9832310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:11:57.9832375Z method(*args, **kwargs) 2025-12-04T10:11:57.9832664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:11:57.9832728Z with policy(): 2025-12-04T10:11:57.9833025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:11:57.9833096Z raise RuntimeError(msg) 2025-12-04T10:11:57.9833900Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9833905Z 2025-12-04T10:11:57.9834030Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9834554Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9834557Z 2025-12-04T10:11:57.9834717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9834917Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9835008Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9835349Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9835512Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9835573Z graph_break [] 2025-12-04T10:11:57.9835702Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:11:57.9836390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:11:57.9836461Z if out == self.unknown_value: 2025-12-04T10:11:57.9836588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9836679Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9836806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9837148Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9837219Z graph_break [] 2025-12-04T10:11:57.9837384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:11:57.9837477Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:11:57.9837599Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:11:57.9837941Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:11:57.9838000Z graph_break [] 2025-12-04T10:11:57.9838500Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml - 2025-12-04T10:11:57.9838601Z =========================== short test summary info ============================ 2025-12-04T10:11:57.9839928Z FAILED [0.4088s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216. 2025-12-04T10:11:57.9839937Z 2025-12-04T10:11:57.9840064Z To execute this test, run the following from the base repo dir: 2025-12-04T10:11:57.9840585Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9840594Z 2025-12-04T10:11:57.9840751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:11:57.9840860Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:11:57.9840978Z ================== 1 failed, 57 deselected, 2 rerun in 11.87s ================== 2025-12-04T10:11:57.9841038Z Got exit code 1 2025-12-04T10:11:57.9841510Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:11:57.9841756Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:11:57.9842058Z W1204 10:11:56.323000 66933 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:11:57.9842481Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml 2025-12-04T10:11:57.9842629Z ============================= test session starts ============================== 2025-12-04T10:11:57.9842840Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:11:57.9842910Z cachedir: .pytest_cache 2025-12-04T10:11:57.9843221Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:11:57.9843298Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:11:57.9843364Z configfile: pytest.ini 2025-12-04T10:11:57.9843679Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:11:57.9843812Z collecting ... collected 58 items / 58 deselected / 0 selected 2025-12-04T10:11:57.9843912Z stepcurrent: skipping 58 already run items. 2025-12-04T10:11:57.9843984Z Running 0 items in this shard 2025-12-04T10:11:57.9843988Z 2025-12-04T10:11:57.9844473Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml - 2025-12-04T10:11:57.9844572Z ============================ 58 deselected in 0.01s ============================ 2025-12-04T10:11:57.9870297Z The following tests failed consistently: ['test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16'] 2025-12-04T10:11:57.9870459Z 2025-12-04T10:11:57.9870839Z FINISHED PRINTING LOG FILE of inductor/test_cuda_select_algorithm 1/1 (test/test-reports/inductor.test_cuda_select_algorithm_1.1_c5144f504c6801ae_.log) 2025-12-04T10:11:57.9870843Z 2025-12-04T10:11:57.9871078Z Finished inductor/test_cuda_select_algorithm 1/1 ... [2025-12-04 10:11:57.382849][4757.31113764], took 45.35min 2025-12-04T10:11:57.9871599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml 2025-12-04T10:11:57.9872147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml 2025-12-04T10:11:57.9872691Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml 2025-12-04T10:11:57.9873195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml 2025-12-04T10:11:57.9873753Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml 2025-12-04T10:11:57.9874257Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml 2025-12-04T10:11:57.9874762Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml 2025-12-04T10:11:57.9875275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml 2025-12-04T10:11:57.9875782Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml 2025-12-04T10:11:57.9876321Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml 2025-12-04T10:11:57.9876829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml 2025-12-04T10:11:57.9877340Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml 2025-12-04T10:11:57.9877847Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml 2025-12-04T10:11:57.9878360Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml 2025-12-04T10:11:57.9878889Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml 2025-12-04T10:11:57.9879392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml 2025-12-04T10:11:57.9879940Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml 2025-12-04T10:11:57.9907058Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml 2025-12-04T10:11:58.0236679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml 2025-12-04T10:11:58.0575123Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml 2025-12-04T10:11:58.0915647Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml 2025-12-04T10:11:58.1300515Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml 2025-12-04T10:11:58.1657732Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml 2025-12-04T10:11:58.1998949Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml 2025-12-04T10:11:58.2305748Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml 2025-12-04T10:11:58.2628483Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml 2025-12-04T10:11:58.3015114Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml 2025-12-04T10:11:58.3404979Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml 2025-12-04T10:11:58.3976957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml 2025-12-04T10:11:58.4318353Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml 2025-12-04T10:11:58.4627166Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml 2025-12-04T10:11:58.4946339Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml 2025-12-04T10:11:58.5277539Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml 2025-12-04T10:11:58.5641351Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml 2025-12-04T10:11:58.5947831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml 2025-12-04T10:11:58.6267381Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml 2025-12-04T10:11:58.6557596Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml 2025-12-04T10:11:58.6889990Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml 2025-12-04T10:11:58.7177716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml 2025-12-04T10:11:58.7482437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml 2025-12-04T10:11:58.7787926Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml 2025-12-04T10:11:58.8116239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml 2025-12-04T10:11:58.8475298Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml 2025-12-04T10:11:58.8844677Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml 2025-12-04T10:11:58.9195603Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml 2025-12-04T10:11:58.9496721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml 2025-12-04T10:11:58.9823888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml 2025-12-04T10:11:59.0111098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml 2025-12-04T10:11:59.0433504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml 2025-12-04T10:11:59.0876396Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml 2025-12-04T10:11:59.1167903Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml 2025-12-04T10:11:59.1464249Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml 2025-12-04T10:11:59.1806183Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml 2025-12-04T10:11:59.2108093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml 2025-12-04T10:11:59.2443642Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml 2025-12-04T10:11:59.2767929Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml 2025-12-04T10:11:59.3107721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml 2025-12-04T10:11:59.3497175Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml 2025-12-04T10:11:59.3878388Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml 2025-12-04T10:11:59.4338244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml 2025-12-04T10:11:59.4636836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml 2025-12-04T10:11:59.4967254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml 2025-12-04T10:11:59.5265434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml 2025-12-04T10:11:59.5595504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml 2025-12-04T10:11:59.5884389Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml 2025-12-04T10:11:59.6222159Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml 2025-12-04T10:11:59.6616744Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml 2025-12-04T10:11:59.7007553Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml 2025-12-04T10:11:59.7304445Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml 2025-12-04T10:11:59.7716066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml 2025-12-04T10:11:59.8007902Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml 2025-12-04T10:11:59.8573022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml 2025-12-04T10:11:59.8855877Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml 2025-12-04T10:11:59.9164012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml 2025-12-04T10:11:59.9495251Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml 2025-12-04T10:11:59.9793719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml 2025-12-04T10:12:00.0137938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml 2025-12-04T10:12:00.0462883Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml 2025-12-04T10:12:00.0807260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml 2025-12-04T10:12:00.1145749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml 2025-12-04T10:12:00.1493858Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml 2025-12-04T10:12:00.1865939Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml 2025-12-04T10:12:00.2143193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml 2025-12-04T10:12:00.2456463Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml 2025-12-04T10:12:00.2784037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml 2025-12-04T10:12:00.3096073Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml 2025-12-04T10:12:00.3423693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml 2025-12-04T10:12:00.3738917Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml 2025-12-04T10:12:00.4078751Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml 2025-12-04T10:12:00.4384638Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml 2025-12-04T10:12:00.4727371Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml 2025-12-04T10:12:00.5054836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml 2025-12-04T10:12:00.5358976Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml 2025-12-04T10:12:00.5713927Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml 2025-12-04T10:12:00.6046935Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml 2025-12-04T10:12:00.6738713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml 2025-12-04T10:12:00.7054421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml 2025-12-04T10:12:00.7378946Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml 2025-12-04T10:12:00.7687662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml 2025-12-04T10:12:00.8036414Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml 2025-12-04T10:12:00.8405693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml 2025-12-04T10:12:00.9690494Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml 2025-12-04T10:12:01.0019287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml 2025-12-04T10:12:01.0323523Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml 2025-12-04T10:12:01.0639375Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml 2025-12-04T10:12:01.0984139Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml 2025-12-04T10:12:01.1345592Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml 2025-12-04T10:12:01.1836395Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml 2025-12-04T10:12:01.2156634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml 2025-12-04T10:12:01.2493738Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml 2025-12-04T10:12:01.2807861Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml 2025-12-04T10:12:01.3172679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml 2025-12-04T10:12:01.3437935Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml 2025-12-04T10:12:01.3738085Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml 2025-12-04T10:12:01.4038693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml 2025-12-04T10:12:01.4425739Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml 2025-12-04T10:12:01.4776844Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml 2025-12-04T10:12:01.5165040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml 2025-12-04T10:12:01.5408913Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml 2025-12-04T10:12:01.5697831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml 2025-12-04T10:12:01.5982470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml 2025-12-04T10:12:01.6323146Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml 2025-12-04T10:12:01.7127498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml 2025-12-04T10:12:01.7471370Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml 2025-12-04T10:12:01.7746967Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml 2025-12-04T10:12:01.8038307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml 2025-12-04T10:12:01.8335709Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml 2025-12-04T10:12:01.8754567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml 2025-12-04T10:12:01.9026055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml 2025-12-04T10:12:01.9299812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml 2025-12-04T10:12:01.9755413Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml 2025-12-04T10:12:02.0046040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml 2025-12-04T10:12:02.0346508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml 2025-12-04T10:12:02.0613636Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml 2025-12-04T10:12:02.0968694Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml 2025-12-04T10:12:02.1259302Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml 2025-12-04T10:12:02.1555039Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml 2025-12-04T10:12:02.1876206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml 2025-12-04T10:12:02.2146122Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml 2025-12-04T10:12:02.2437490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml 2025-12-04T10:12:02.2803447Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml 2025-12-04T10:12:02.3097512Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml 2025-12-04T10:12:02.3398942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml 2025-12-04T10:12:02.3724237Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml 2025-12-04T10:12:02.4024848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml 2025-12-04T10:12:02.4455767Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml 2025-12-04T10:12:02.4798529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml 2025-12-04T10:12:02.5133617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml 2025-12-04T10:12:02.5466743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml 2025-12-04T10:12:02.5758476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml 2025-12-04T10:12:02.6016432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml 2025-12-04T10:12:02.6266093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml 2025-12-04T10:12:02.6536263Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml 2025-12-04T10:12:02.6925953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml 2025-12-04T10:12:02.7226502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml 2025-12-04T10:12:02.7499168Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml 2025-12-04T10:12:02.7784100Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml 2025-12-04T10:12:02.8187263Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml 2025-12-04T10:12:02.8495968Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml 2025-12-04T10:12:02.8791878Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml 2025-12-04T10:12:02.9079251Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml 2025-12-04T10:12:02.9418127Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml 2025-12-04T10:12:02.9718019Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml 2025-12-04T10:12:03.0018094Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml 2025-12-04T10:12:03.0329428Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml 2025-12-04T10:12:03.0623176Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml 2025-12-04T10:12:03.0906624Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml 2025-12-04T10:12:03.1217726Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml 2025-12-04T10:12:03.1495407Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml 2025-12-04T10:12:03.1819326Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml 2025-12-04T10:12:03.2127587Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml 2025-12-04T10:12:03.2379096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml 2025-12-04T10:12:03.2685598Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml 2025-12-04T10:12:03.2984497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml 2025-12-04T10:12:03.3279807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml 2025-12-04T10:12:03.6466180Z Uploading logs for 57116084862 to S3 2025-12-04T10:12:03.7671762Z Uploading artifacts took 0.42 seconds 2025-12-04T10:12:03.7672132Z inductor/test_cuda_select_algorithm 1/1 failed! 2025-12-04T10:12:03.7675596Z Running inductor/test_compile_subprocess 2/2 ... [2025-12-04 10:12:03.767352][4763.695648876] 2025-12-04T10:12:03.7676089Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:12:03.7679456Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_subprocess.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:12:03.767694] 2025-12-04T10:18:48.8724179Z 2025-12-04T10:18:48.8725459Z inductor/test_compile_subprocess 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_subprocess_2.2_b00c67e905398654_.log 2025-12-04T10:18:48.8932054Z Running 464 items in this shard: test/inductor/test_compile_subprocess.py::TestSubprocess::test_async, test/inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_pack_4bit_weight_bf16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex9_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex_strided_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_const_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_const_int_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aliased_buffer_reuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_allow_reuse_active_if_under_peak_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_any_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_cache_hit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_dtype_device_layout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_support_str_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_with_persistent_cache_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_duplicates_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_nan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_min_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_as_strided_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_as_strided_on_views_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_alignment_op_name_pass_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_size_stride_op_name_fail_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_async, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool_errors_with_uint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bfloat16_to_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bmm2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_add_autotune_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_default_kwargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_nd_tiling_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_batch_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_pos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_empty_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_extern_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_of_loops_and_extern_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_unbacked_empty_1d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_upcasting_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cauchy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_check_stack_no_cycles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_type_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clone_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_computed_buffer_inlining_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_concat_add_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_cudagraphs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_recompiles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumprod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_3d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_nd_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv1d_depthwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv2d_backward_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv2d_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv3d_channels_last_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_functional_bn_fuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_scalar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cpp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_tensor_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cudnn_rnn_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cummin_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumprod_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_no_mask_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_compiled_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_would_split_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_deterministic_codegen_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_deterministic_codegen_on_graph_break_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_device_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dist_bf16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dist_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_deterministic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_trivial_0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_bag_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_sparse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_emulate_precision_triton_fp_fusion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfinv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expanded_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_basic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_list_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_list_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_no_mutated_tensors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_with_return_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fft_real_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fill1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float16_to_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float32_to_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_type_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmin_fmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmod_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_sliced_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_transposed_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_truncation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fuse_large_params_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fuse_tiled_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fusing_write_into_disjoint_read_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generate_rand_fp8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generated_code_has_alignment_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generated_code_has_size_stride_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_glu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gpu_scalar_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gpu_scalar_with_gpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_arange1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_argmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_both_scalars_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_no_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_pad_dynamic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_refcount_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_unbacked_symint_as_output_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_expand_preserves_view_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardsigmoid_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardswish_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_horizonal_fusion2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_float_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_abs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_device_assert_masked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_flip_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_floordiv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_nested_indirect_indexing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_select_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_multiple_specializations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_triton_bucketize_respects_masking_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inner_fn_str_and_stride_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_activations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_mixed_dtype_ops_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_where_pointwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_int_input_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_invalid_operand_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_pointwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_strided_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_leaky_relu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lerp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lgamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands_sliced_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_dynamic_maxautotune_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linspace1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linspace3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_dynamic_shape_assertion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_mode_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_flex_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_invoke_subgraph_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_repeated_blocks_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log1p_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log_softmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_long_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mark_dynamic_with_hint_override_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mark_unbacked_with_hint_override_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_matmul_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_min_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d6_dilation_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_min_max_reduction_nan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mm_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mm_views_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_any_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_sum_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_lowp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutations_loop_fusion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_assert_inside_triton_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_to_num_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_narrow_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_neg_max_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_forward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_no_specization_over_symbolic_value_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nonzero_unbacked_refinement_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_output_strides_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pad_cast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pad_view_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_multi_user_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_permute2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pixel_shuffle_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_airy_ai_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_v_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_w_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfcx_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaincc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_hermite_polynomial_he_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i0e_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i1e_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_log_ndtr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_logit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_multigammaln_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_ndtr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_psi_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_t_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_u_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_v_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_w_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_polar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_by_natural_log2_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_prepare_softmax_with_fast_math_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_progressive, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_distribution_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_int64_mod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_kernel_count_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_like_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_with_dtype_and_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_clone_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_copy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_decomposition_has_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_replication_pad_errors_with_bool_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_require_stride_expanded_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_resize_as_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_resize_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roi_align_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roll_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_round_correctness_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_cpu_tensor_arg_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_output_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_efficient_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_add2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_add3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scheduler_vertical_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_freezing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_searchsorted_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_shape_prop_torch_ones_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_should_pad_bench_for_bmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sigmoid_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sign_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_signbit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_simplify_loops_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sin_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_single_elem_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_single_elem_indirect_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sizehint_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_softmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_softmax_one_kernel_persist_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumprod_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_dynamic_shape_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_with_int64_size_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_integer_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_unbacked_symints_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sqrt_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze_varargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_stack_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor_index_slice_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_torch_device_split_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_transpose_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_uint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbind_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unfold_zero_dimension_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unsigned_constant_tensors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unsqueeze_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_bicubic2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_bilinear2d_b_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest3d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_div_by_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vectorized_ops_masked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vectorized_ops_masked_var_novec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vertical_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_on_aliased_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_uint8_through_differing_bitwidths_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_xblock_divides_xnumel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_zeros_cuda 2025-12-04T10:18:48.9128757Z 2025-12-04T10:18:48.9129247Z Finished inductor/test_compile_subprocess 2/2 ... [2025-12-04 10:18:48.876683][5168.804975674], took 6.75min 2025-12-04T10:18:48.9130816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-2cb94ab29d3c0df8.xml 2025-12-04T10:18:49.0325917Z Running test_decomp 1/22 ... [2025-12-04 10:18:49.032356][5168.96065298] 2025-12-04T10:18:49.0326335Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:18:49.0329180Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=1', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:18:49.032659] 2025-12-04T10:25:12.1067611Z 2025-12-04T10:25:12.1069065Z test_decomp 1/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_1.22_14e10bdd16255327_.log 2025-12-04T10:25:12.1245988Z Running 435 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logaddexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_uint8 2025-12-04T10:25:12.1422798Z 2025-12-04T10:25:12.1423134Z Finished test_decomp 1/22 ... [2025-12-04 10:25:12.107179][5552.035472922], took 6.38min 2025-12-04T10:25:12.1424299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-ab5aa4d4069f84fb.xml 2025-12-04T10:25:12.2404057Z Running test_decomp 5/22 ... [2025-12-04 10:25:12.240137][5552.168436024] 2025-12-04T10:25:12.2404642Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:25:12.2407176Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=5', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:25:12.240446] 2025-12-04T10:32:20.1281690Z 2025-12-04T10:32:20.1283742Z test_decomp 5/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_5.22_0df19fb5b56a60f0_.log 2025-12-04T10:32:20.1375510Z Running 411 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_igamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_squeeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_igammac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_train_mode_cuda_float64, test/test_decomp.py::DecompOneOffTestsCUDA::test_amp_batch_norm_backward_cuda 2025-12-04T10:32:20.1462471Z 2025-12-04T10:32:20.1462656Z Finished test_decomp 5/22 ... [2025-12-04 10:32:20.128755][5980.057047772], took 7.13min 2025-12-04T10:32:20.1504733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-35f77a6004c12574.xml 2025-12-04T10:32:20.6199910Z Uploading artifacts took 0.39 seconds 2025-12-04T10:32:20.6203502Z Running test_decomp 10/22 ... [2025-12-04 10:32:20.620138][5980.548434446] 2025-12-04T10:32:20.6203925Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:32:20.6207236Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=10', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:32:20.620467] 2025-12-04T10:37:04.1376573Z 2025-12-04T10:37:04.1378754Z test_decomp 10/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_10.22_12f8ad2c135e2bbb_.log 2025-12-04T10:37:04.1468126Z Running 393 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_eval_mode_cuda_float64 2025-12-04T10:37:04.1552824Z 2025-12-04T10:37:04.1553017Z Finished test_decomp 10/22 ... [2025-12-04 10:37:04.138274][6264.066567969], took 4.73min 2025-12-04T10:37:04.1600195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-4e86df63b0f31fa2.xml 2025-12-04T10:37:04.2516678Z Running test_decomp 15/22 ... [2025-12-04 10:37:04.251441][6264.179739275] 2025-12-04T10:37:04.2517419Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:37:04.2520213Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=15', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:37:04.251741] 2025-12-04T10:44:51.0558308Z 2025-12-04T10:44:51.0559611Z test_decomp 15/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_15.22_3612d41b87c57e18_.log 2025-12-04T10:44:51.0655756Z Running 431 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_eval_mode_cuda_float32, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float16 2025-12-04T10:44:51.0748451Z 2025-12-04T10:44:51.0748637Z Finished test_decomp 15/22 ... [2025-12-04 10:44:51.056364][6730.984655191], took 7.78min 2025-12-04T10:44:51.0779059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-421be54fd2475226.xml 2025-12-04T10:44:51.1743507Z Running test_decomp 20/22 ... [2025-12-04 10:44:51.174097][6731.102395458] 2025-12-04T10:44:51.1743902Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:44:51.1746669Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=20', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:44:51.174395] 2025-12-04T10:50:29.4905709Z 2025-12-04T10:50:29.4907585Z test_decomp 20/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_20.22_0285151ec6a3cff1_.log 2025-12-04T10:50:29.5000820Z Running 411 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gcd_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_transpose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_train_mode_cuda_float32, test/test_decomp.py::HasDecompTest::test_has_decomposition 2025-12-04T10:50:29.5087221Z 2025-12-04T10:50:29.5087410Z Finished test_decomp 20/22 ... [2025-12-04 10:50:29.491247][7069.419542076], took 5.64min 2025-12-04T10:50:29.5132009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-269f711d29febed7.xml 2025-12-04T10:50:29.6245436Z Running test_ci_sanity_check_fail 1/1 ... [2025-12-04 10:50:29.624292][7069.552590124] 2025-12-04T10:50:29.6246049Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:50:29.6248578Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ci_sanity_check_fail.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:50:29.624596] 2025-12-04T10:50:42.1243384Z Finished test_ci_sanity_check_fail 1/1 ... [2025-12-04 10:50:42.123913][7082.052205858], took 0.21min 2025-12-04T10:50:42.1458012Z Running test_ops 3/9 ... [2025-12-04 10:50:42.145565][7082.073863143] 2025-12-04T10:50:42.1458543Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:50:42.1461654Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=3', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:50:42.145923] 2025-12-04T11:12:57.0030510Z 2025-12-04T11:12:57.0031236Z test_ops 3/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_3.9_cd041df637bc19f1_.log 2025-12-04T11:12:57.0899519Z Running 3669 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_H_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_polar_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mH_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardswish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_where_cuda, test/test_ops.py::TestCommonCUDA::test_errors_amin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_item_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_errors_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_tril_cuda, test/test_ops.py::TestCommonCUDA::test_errors_where_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rmul___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_channel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_cosine_embedding_loss_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_put_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize_as__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_list_args_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tile_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_out_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestCommonCUDA::test_out_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diff_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rms_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softsign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sparse_mm_reduce_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_where_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_out_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_istft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_polar_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_randn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_permuted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vsplit_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view__chunk_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_alias_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flip_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_item_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ne_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_positive_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_any_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_combinations_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_unary_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cond_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pca_lowrank_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unsafe_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_equal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_istft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_renorm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_multiple_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_alias_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gradient_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_unary_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_householder_product_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mH_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_normal_in_place_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rand_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_list_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_multiple_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tile_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view_T_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_geometric_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_roll_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__segment_reduce_offsets_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_any_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_inverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_float_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log10_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nan_to_num_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_linear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_selu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_svd_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trunc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsafe_chunk_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__chunk_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_in_place_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int8, test/test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__chunk_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_char_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_alias_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_copysign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_count_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_float_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_xor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_erfcx_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_stft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bincount_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_corrcoef_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gradient_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_fro_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scalar_tensor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_t_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensordot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_like_cuda_float32 2025-12-04T11:12:57.1746644Z 2025-12-04T11:12:57.1746839Z Finished test_ops 3/9 ... [2025-12-04 11:12:57.006945][8416.935238594], took 22.25min 2025-12-04T11:12:57.1747485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-f35e359ea3f52347.xml 2025-12-04T11:12:57.6745956Z Uploading artifacts took 0.48 seconds 2025-12-04T11:12:57.6748264Z Running test_ops 8/9 ... [2025-12-04 11:12:57.674612][8417.602908108] 2025-12-04T11:12:57.6748657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:12:57.6751815Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=8', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:12:57.674936] 2025-12-04T11:33:20.9917703Z 2025-12-04T11:33:20.9918436Z test_ops 8/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_8.9_44eee4e8e92c270e_.log 2025-12-04T11:33:21.0786183Z Running 3704 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing___getitem___cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsafe_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__chunk_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_multiple_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_with_logits_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_logsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nonzero_static_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_squeeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch_ops_aten__flash_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsafe_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_errors___ror___cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_normal_in_place_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout0_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_errors_trace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hash_tensor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_return_by_ref_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_reduce_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zero__cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_permuted_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_in_place_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsafe_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__unsafe_masked_index_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cov_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_multi_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mT_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_channel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest-exact_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_kl_div_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_nuc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_permute_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rand_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_roll_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal__in_place_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_hash_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geometric_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_permute_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_geometric_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_zeros_like_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isreal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logaddexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_multiple_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cov_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_einsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expm1_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_hstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ldexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_unpack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_interleave_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rpow___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_any_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expm1_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_masked_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_square_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagflat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_triangular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logaddexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_outer_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sgn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_contiguous_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flipud_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_erfcx_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_alias_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_allclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kron_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_slogdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_constant_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_rms_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softplus_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ormqr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pca_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_interleave_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_exponential_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_t_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tensor_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unique_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geometric_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hash_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__unsafe_masked_index_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_item_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_in_place_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_uint8, test/test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rsub___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_block_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ravel_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_signbit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_entr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_angle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_asin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanquantile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_to_sparse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestTagsCUDA::test_tags_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_consecutive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unsqueeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_floor_rounding_cuda_float32 2025-12-04T11:33:21.1629192Z 2025-12-04T11:33:21.1629396Z Finished test_ops 8/9 ... [2025-12-04 11:33:20.996037][9640.924329935], took 20.39min 2025-12-04T11:33:21.1630025Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-7327fc5de50caef8.xml 2025-12-04T11:33:21.6231928Z Uploading artifacts took 0.45 seconds 2025-12-04T11:33:21.6235846Z Running inductor/test_torchinductor_dynamic_shapes 4/4 ... [2025-12-04 11:33:21.623360][9641.551655226] 2025-12-04T11:33:21.6236357Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:33:21.6240204Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:33:21.623726] 2025-12-04T11:43:07.7900340Z 2025-12-04T11:43:07.7901390Z inductor/test_torchinductor_dynamic_shapes 4/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_4.4_d7a417aa701cd416_.log 2025-12-04T11:43:07.8087282Z Running 518 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_matmul_4bit_bf16_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adding_tensor_offsets_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_override_registration_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_batch_norm_2d_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bfloat16_to_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bmm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_both_scalars_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_add_autotune_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cauchy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_concat_add_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_config_option_dont_assume_alignment_cudagraphs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_const_int32_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv1d_depthwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv2d_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_functional_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_inference_heuristics_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumprod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_by_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_presicion_accuracy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_prim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_trivial_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_elu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_empty2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_exact_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_expand_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_expm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fft_real_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fft_real_input_real_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fill1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flip_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float16_to_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_repr_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_full_like_transposed_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fuse_large_params_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_generate_rand_fp8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_getitem_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_glu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_grid_sampler_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_float_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_indirect_load_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inf_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inner_reduction_detection_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_where_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_insignificant_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_isinf2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_kernel_names_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_strided_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_list_clearing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mark_dynamic_with_hint_override_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_masked_fill_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_matmul_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mix_device_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mul_index_expr_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multi_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_to_num_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_narrow_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_ones_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_no_op_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_no_specization_over_symbolic_value_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_output_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pad_view_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_permute1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pixel_shuffle_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_chebyshev_polynomial_u_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_expm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i0e_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1e_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_psi_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_spherical_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_by_natural_log2_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randint_int64_mod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_generator_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_relu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_view_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_roi_align_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rsqrt_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_unaligned_mask_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_select_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_silu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_simplify_loops_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sizehint_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_view_with_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_one_kernel_loop_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_one_kernel_persist_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_reduction_dynamic_shape_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor_index_put_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tmp_not_defined_issue2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_to_memory_format_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_triu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unbacked_float_item_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unroll_small_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_as_complex_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_as_real_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_bwd_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_where_with_logical_op_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_zero_element_mutation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex_strided_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_inplace_permuted_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adding_tensor_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_alexnet_prefix_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aliased_buffer_reuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_with_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin_with_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_as_strided_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_alignment_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool_errors_with_uint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_baddbmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_batch_norm_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bmm2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_default_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_nd_tiling_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_int_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_compar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_from_real_imag_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_config_option_dont_assume_alignment_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_consecutive_split_cumprod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_consecutive_split_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_const_int32_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv1d_depthwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv1d_with_permute_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv2d_backward_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv3d_channels_last_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_functional_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_inference_heuristics_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cudnn_rnn_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_default_layout_constraint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_scan_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div9_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_by_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_precision_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtype_mismatch_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_elu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_erfinv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exact_stride_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expand_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expand_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expanded_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_basic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fill1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fill2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_index_expression_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_repr_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fmod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_forced_buffer_realize_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_truncation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_generate_rand_fp8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_glu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_constant_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_unbacked_symint_as_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_grid_sampler_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_failed_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_fallback2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_select_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_indirect_load_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_add_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_isinf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_broadcast_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_leaky_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_mixed_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linspace4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_list_clearing_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_dynamic_shape_assertion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log_fp64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logaddexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logcumsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mark_unbacked_with_hint_override_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mix_device_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_move_arange_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_gpu_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_prime_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_narrow_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_needs_contiguous_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_new_ones_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nll_loss_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nll_loss_forward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_norm_constant_overflow_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_output_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pad_single_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pattern_matcher_unbacked_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_permute1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_digamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_erfc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_erfcx_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_erfinv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_gammaincc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i0e_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_multigammaln_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_round_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_xlog1py_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_zeta_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction_config_limit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_view_default_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_require_stride_expanded_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_round_correctness_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scalar_cpu_tensor_arg_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scalar_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scaled_dot_product_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_add2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_select_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_shape_prop_torch_ones_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_single_elem_indirect_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_one_kernel_loop_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_stable_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_failed_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_integer_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tanh_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor_index_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_device_constant_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_topk_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_transpose_add_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_cat_conv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_nearest3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_correction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_as_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_on_aliased_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_uint8_through_differing_bitwidths_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views7_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_weight_norm_bwd_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_weight_norm_conv2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_where_with_logical_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zero_dim_reductions_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zero_element_mutation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zeros_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_cat_unbacked_duplicate_size_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_constant_fold_uniform_value_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_item_inf_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_item_return_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_return_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_unbacked_stride_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_mark_unbacked_slice_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op5_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op9_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_multi_output_unbacked_custom_op_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_non_persistent_dynamic_rblock_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_noops_tensor_repropagate_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_slice_index_changing_sign_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sym_stride_lowering_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_cat_backwards_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_index_select_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unwrap_storage_didnt_work_repro_cuda 2025-12-04T11:43:07.8263435Z 2025-12-04T11:43:07.8263741Z Finished inductor/test_torchinductor_dynamic_shapes 4/4 ... [2025-12-04 11:43:07.790900][10227.719191973], took 9.77min 2025-12-04T11:43:07.8264674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-e6d2768dce09d0dd.xml 2025-12-04T11:43:07.9101187Z Running inductor/test_torchinductor_opinfo 2/13 ... [2025-12-04 11:43:07.909876][10227.838174483] 2025-12-04T11:43:07.9101723Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:43:07.9104140Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=2', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:07.910155] 2025-12-04T11:52:05.9037694Z 2025-12-04T11:52:05.9040055Z inductor/test_torchinductor_opinfo 2/13 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_2.13_c074395000e8f728_.log 2025-12-04T11:52:05.9130210Z Running 263 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cauchy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gcd_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_quantile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_float64 2025-12-04T11:52:05.9215455Z 2025-12-04T11:52:05.9215722Z Finished inductor/test_torchinductor_opinfo 2/13 ... [2025-12-04 11:52:05.904431][10765.832724178], took 8.97min 2025-12-04T11:52:05.9267808Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-52326583abfcb307.xml 2025-12-04T11:52:06.0229283Z Running inductor/test_torchinductor_opinfo 7/13 ... [2025-12-04 11:52:06.022698][10765.95099546] 2025-12-04T11:52:06.0229769Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:52:06.0232720Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=7', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:52:06.023001] 2025-12-04T12:00:48.3400756Z 2025-12-04T12:00:48.3403594Z inductor/test_torchinductor_opinfo 7/13 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_7.13_206d55120439a46b_.log 2025-12-04T12:00:48.3567952Z Running 252 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_lengths_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_inverse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svd_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svdvals_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_uniform_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int64 2025-12-04T12:00:48.3726791Z 2025-12-04T12:00:48.3727270Z Finished inductor/test_torchinductor_opinfo 7/13 ... [2025-12-04 12:00:48.341053][11288.269344819], took 8.71min 2025-12-04T12:00:48.3728879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-b0e5cfa73b17bf79.xml 2025-12-04T12:00:48.9296959Z Uploading artifacts took 0.47 seconds 2025-12-04T12:00:48.9301249Z Running inductor/test_torchinductor_opinfo 12/13 ... [2025-12-04 12:00:48.929875][11288.858170883] 2025-12-04T12:00:48.9301942Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:00:48.9305405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=12', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:00:48.930239] 2025-12-04T12:11:53.7857524Z 2025-12-04T12:11:53.7858949Z inductor/test_torchinductor_opinfo 12/13 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.13_b0b968134062f752_.log 2025-12-04T12:11:53.7945947Z Running 259 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gcd_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eig_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_householder_product_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ormqr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rand_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_float16 2025-12-04T12:11:53.8030811Z 2025-12-04T12:11:53.8031094Z Finished inductor/test_torchinductor_opinfo 12/13 ... [2025-12-04 12:11:53.786124][11953.714418457], took 11.08min 2025-12-04T12:11:53.8093061Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-44c334397fb0c3bd.xml 2025-12-04T12:11:53.8847650Z Running inductor/test_cuda_repro 1/1 ... [2025-12-04 12:11:53.884448][11953.812743979] 2025-12-04T12:11:53.8848262Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:11:53.8850569Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_repro.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:11:53.884764] 2025-12-04T12:13:14.9904834Z 2025-12-04T12:13:14.9905869Z inductor/test_cuda_repro 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cuda_repro_1.1_1c12e5d6528c7a17_.log 2025-12-04T12:13:14.9931305Z Running 96 items in this shard: test/inductor/test_cuda_repro.py::CudaReproTests::test_3d_tiling, test/inductor/test_cuda_repro.py::CudaReproTests::test_accuracy_issue1, test/inductor/test_cuda_repro.py::CudaReproTests::test_adaptive_avg_pool3d_issue_157248, test/inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16, test/inductor/test_cuda_repro.py::CudaReproTests::test_autotune_inplace_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_backward_context, test/inductor/test_cuda_repro.py::CudaReproTests::test_bool_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_dynamic_dense, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_epilogue, test/inductor/test_cuda_repro.py::CudaReproTests::test_cat_int8_one_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_cpu_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_deterministic_algorithms, test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses, test/inductor/test_cuda_repro.py::CudaReproTests::test_dtype_factory_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_persistent_reductions, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_to_static_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned, test/inductor/test_cuda_repro.py::CudaReproTests::test_embedding_var_mean, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_mean_ratio_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_min_pow_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_norm_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs_no_size_asserts, test/inductor/test_cuda_repro.py::CudaReproTests::test_flash_attention_dynamic, test/inductor/test_cuda_repro.py::CudaReproTests::test_float64_constants, test/inductor/test_cuda_repro.py::CudaReproTests::test_float8_e8m0fnu, test/inductor/test_cuda_repro.py::CudaReproTests::test_full_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_identity_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_add_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_inplace_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_no_fallback_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_indirect_indexing_dense_mask, test/inductor/test_cuda_repro.py::CudaReproTests::test_inductor_output_aliases_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_add_alpha_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_buffer_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_updates_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_input_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_int64_index_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue100806, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103461, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103481, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue104759, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_1input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_2input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue_103924, test/inductor/test_cuda_repro.py::CudaReproTests::test_libdevice_routing, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_cpu_input, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_with_zero_infeature_size, test/inductor/test_cuda_repro.py::CudaReproTests::test_lookup_seed_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_max_autotune_nograd, test/inductor/test_cuda_repro.py::CudaReproTests::test_memory_history_inductor, test/inductor/test_cuda_repro.py::CudaReproTests::test_mm_out_dtype_compile, test/inductor/test_cuda_repro.py::CudaReproTests::test_multi_output_layout_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_mutated_aligned_tensor, test/inductor/test_cuda_repro.py::CudaReproTests::test_negative_arange_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_no_device_idx_repro_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_commutative_scan_op, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_contiguous_unaligned_input_indices, test/inductor/test_cuda_repro.py::CudaReproTests::test_normalize_norm_leq_one, test/inductor/test_cuda_repro.py::CudaReproTests::test_not_initializing_wrong_device, test/inductor/test_cuda_repro.py::CudaReproTests::test_permute_fusion, test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile, test/inductor/test_cuda_repro.py::CudaReproTests::test_red_dtype_mismatch, test/inductor/test_cuda_repro.py::CudaReproTests::test_reflection_pad_loop_order, test/inductor/test_cuda_repro.py::CudaReproTests::test_repeated_masked_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_scalar_triton_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_scaled_dot_product_efficient_attention_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_scatter_index_not_wrapped, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape0_quantiles_strides0_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape1_quantiles_strides1_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape2_quantiles_strides2_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape3_quantiles_strides3_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape4_quantiles_strides4_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape5_quantiles_strides5_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape6_quantiles_strides6_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape7_quantiles_strides7_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_selecsls42b_misaligned_address, test/inductor/test_cuda_repro.py::CudaReproTests::test_simplify_dims, test/inductor/test_cuda_repro.py::CudaReproTests::test_sort_stride_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_sorted_masks, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_transposed, test/inductor/test_cuda_repro.py::CudaReproTests::test_triton_interpret, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_base_not_bitwise_equivalent, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_emulate_divison_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_uint_view_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_unspec_inputs_interop, test/inductor/test_cuda_repro.py::CudaReproTests::test_unused_cpu_input_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_view_replay_padding_issue_163328, test/inductor/test_cuda_repro.py::CudaReproTests::test_xlnet_lm_stride_repro 2025-12-04T12:13:14.9953941Z 2025-12-04T12:13:14.9954170Z Finished inductor/test_cuda_repro 1/1 ... [2025-12-04 12:13:14.990367][12034.918655146], took 1.35min 2025-12-04T12:13:15.0134591Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3098e6f6c63481df.xml 2025-12-04T12:13:15.1291233Z Running inductor/test_compiled_autograd 1/2 ... [2025-12-04 12:13:15.128819][12035.057111759] 2025-12-04T12:13:15.1291714Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:13:15.1294467Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:13:15.129170] 2025-12-04T12:20:32.3824219Z 2025-12-04T12:20:32.3825785Z inductor/test_compiled_autograd 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_1.2_dcced55d4b6d289b_.log 2025-12-04T12:20:32.3974890Z Running 438 items in this shard: test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_5_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_already_nan, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_backward, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_grad, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_basic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_id_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_non_traceable, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_dynamic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_float_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_backward_hook_relative_ordering_partial, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cache_hit, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_sac, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compiled_autograd_does_not_specialize_on_bw_symints, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cpu_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_graph, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_cpp_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_python_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_sdpa, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_compiled_fw_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_dynamically_defined_class, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_multiple_grads, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_attr, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_multiple_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_cpp_reducer_error, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_python_reducer, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_disk_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_annotations, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_eager_node, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamo_boxed, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_flex_attention, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_free_activation_memory_subclass, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_higher_order_gradients, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_cpp_extension, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_torch, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inplace_grad_update, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inputs_aliasing_bytecode_stack_restore, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_issue106555, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_keep_graph_usage_after_compiled, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_logging_tensor_flaky, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_output_nodes_all_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_tensor_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reset, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_saved_tensor_unpack_hook_ordering, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_only_backward_call, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_function_mode, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_run_with_rng_state, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes_hop, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_cpp, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_snapshot, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_access_saved_tensor_twice_without_recomputation_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_should_not_execute, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_with_zero_numel_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_assign_parent_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_detect_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_mode_no_check_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_view_of_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_creation_meta, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_cross_dtype, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_multiple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_simple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_views_codegen, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_copy, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_create_graph_warns, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_hook_relative_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_retained_graph_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_with_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_calculate_shape_util, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_adds_callback, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_cant_create_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_detects_non_determinism, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_graph_execution_group, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_valid_reset_on_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_correct_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_custom_function_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_dataparallel, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_memory_savings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_create_graph_and_full_backward_hook_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_graph_task_execution_order, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_ac_early_stop, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_no_early_free, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_repeated_grad_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_tensor_before_tensor_args, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_wrong_formula, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_mark_dirty_not_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_preserve_torch_function_when_return_as_is, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saving_mutated_view_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_setup_context_simple, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_vmap_defaults, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_deep_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dep_nograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dependent_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_base, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_then_inplace_raises_in_autograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks_nested, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_duplicate_backward_root, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_enable_grad_decorator_no_paren, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_first_grad_fn_access_in_no_grad_mode, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_complicated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_pyfunction, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_get_data_and_hooks_from_raw_saved_variable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_empty_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_input_metadata, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf_register_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_thread_safety, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_materialize, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_unreachable_discovery, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_forward_or_backward_only, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_complex_non_complex_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_custom_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_dense_and_sparse_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_runs_with_no_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout2, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout4, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_test_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_validates_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_graph_save_on_cpu, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_edge_case_when_called_with_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_none, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hooks_cpp, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_indexing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_not_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_leaf_errors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_weak_grad_fn, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_integer_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_legacy_function_deprecation_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_lobpcg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_mark_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_named_tensor_for_complex_views, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_anomaly_access, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_autograd_function_stashing_ctx, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_nested_anomaly_printstack_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_next_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_python_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_requires_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_unnecessary_save, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_not_implemented_fwad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pickle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_returns_not_None, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pow_zero_tensor_gradient, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_power_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_prehook_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_table, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_function_event_avg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_seq_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_shapes, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_child_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_depth_0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_leaf_variable_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_requires_grad_, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retains_grad_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_none_for_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_on_cpu_and_checkpoint, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_output_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_custom_function_intermediates, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_extra_enter_during_bw_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_scalar_grad_mixed_device, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_select_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_data_tensorimpl_type, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines_benign_exceptions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_enabled_wraps, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_generator_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_materialize_non_diff_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_shape, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sharded_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_both_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim_neg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_ind_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_grad_warnings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_thread_shutdown, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_too_many_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unrelated_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unused_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_var_mean_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_view_func_replay_with_modified_state, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_volatile_deprecated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_will_engine_execute_node, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_kwargs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_same_graph_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_True, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op_with_CompositeExplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_grad_for_nontensor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_mutable, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_no_output, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_with_key_key_AutogradCUDA, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_tensorlist, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_tensorlist_input_requires_list_grads_with_same_numel, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_basic_make_fx, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_basic, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_nms_dynamic_compile, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_defined_in_python, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_duplicate_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_abstract_overload, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_cpu, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_invalid_devices, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_multiple, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CPU, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CompositeImplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_separate, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_supported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_unsupported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_qualname, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_schemas, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_functional_schema, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_tensorlist_like_type, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_define, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_meta_for_data_dependent_shape_operation, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_name_must_match, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_new_data_dependent_symint, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_meta, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_private_ctor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_supported_param_types, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_symints, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_unsupported_schemas, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_allow_python_side_effects_utility, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_constants, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_input_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_numpy_number, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_tracked, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_untracked_global_nested, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_branches_no_arguments, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_free_variable_in_both_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_graph_break_in_one_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_pytree_operands, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_side_effect_in_one_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_with_constant_pred, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fallback_on_graph_break_simple, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_freevars_as_inputs_to_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_grad_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_no_hints, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hopify_generic_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_internal_nonlocal, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensors_with_compound_expressions, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_lowers_to_graph, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_multi_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_pytree_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_subgraph_name_is_valid, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_tuple_output, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_no_freevars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_output_with_dict, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_register_subclass, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var_used_multiple_times, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_vars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_nonlocal_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_local_list_append_no_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_list, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_tensor, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_tensor_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nested_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_symint_in_slice, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_unbacked_symbol_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_multiply_scalar, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_allow_local_assign_in_body_fn, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_inductor_compiled_regions_option, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_default_else_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_only, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_recompile, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_source_fn_stack, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_functional_call_sequential_params_and_buffers, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_fn_with_kwargs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_python_scalar, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_pytree, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_side_effect, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_two_tensors_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_simple, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_two_tensors_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_teardown_resets_nested_graph_breaks, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs_python_struct, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_free_const, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_out_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_outputs_diff_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_over_vmap_captured, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_pytree_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_different_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_same_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects_append_input, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs_tuple_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_conditional_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_cond_with_invalid_kwargs, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_dropout_inductor, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond_unbalanced_branches, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_module, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_non_aliasing_util, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_device_mesh_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_basic_export, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_dynamo_disable, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_graph_break, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_different_gradient_placement, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dont_recompile_on_same_placement_devicemesh, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_loss_parallel_log_softmax, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_slice, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamo_device_mesh_attrs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_graph_output, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_redistribute_unbalanced_correct_strides, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_requires_grad_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate_as_args, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs_forward_hook, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_fakify_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_graph_input_is_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_placement_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_unwrap_async_collective_tensor_tangent, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_cond_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_quant_packed_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_subgraph_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_nested_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_while_loop_simple_cuda_float32 2025-12-04T12:20:32.4118574Z 2025-12-04T12:20:32.4118833Z Finished inductor/test_compiled_autograd 1/2 ... [2025-12-04 12:20:32.383014][12472.311307639], took 7.29min 2025-12-04T12:20:32.4119674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-d8fc516c8be54fc6.xml 2025-12-04T12:20:32.4905559Z Running inductor/test_layout_optim 1/1 ... [2025-12-04 12:20:32.490316][12472.418613128] 2025-12-04T12:20:32.4906150Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:20:32.4908647Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_layout_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:32.490611] 2025-12-04T12:20:37.8788968Z 2025-12-04T12:20:37.8790500Z inductor/test_layout_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_layout_optim_1.1_85312b2aa31c9171_.log 2025-12-04T12:20:37.8791336Z Running 0 items in this shard: 2025-12-04T12:20:37.8791517Z 2025-12-04T12:20:37.8791822Z Finished inductor/test_layout_optim 1/1 ... [2025-12-04 12:20:37.878647][12477.806944429], took 0.09min 2025-12-04T12:20:37.9013785Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_layout_optim/inductor.test_layout_optim-ff0e0fc528f4f3dd.xml 2025-12-04T12:20:37.9309908Z Running dynamo/test_exc 1/1 ... [2025-12-04 12:20:37.930781][12477.859080493] 2025-12-04T12:20:37.9310605Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:20:37.9313481Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_exc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:37.931075] 2025-12-04T12:20:43.7560415Z 2025-12-04T12:20:43.7561625Z dynamo/test_exc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_exc_1.1_e6b03d521cd15643_.log 2025-12-04T12:20:43.7564414Z Running 10 items in this shard: test/dynamo/test_exc.py::ExcTests::test_backend_suppress_line, test/dynamo/test_exc.py::ExcTests::test_graph_break_log, test/dynamo/test_exc.py::ExcTests::test_graph_break_log_generic_jump, test/dynamo/test_exc.py::ExcTests::test_internal_error_no_suppress, test/dynamo/test_exc.py::ExcTests::test_internal_error_suppress_errors, test/dynamo/test_exc.py::ExcTests::test_not_implemented_error, test/dynamo/test_exc.py::ExcTests::test_trigger_bisect_on_error, test/dynamo/test_exc.py::ExcTests::test_trigger_on_error, test/dynamo/test_exc.py::ExcTests::test_unsupported_error, test/dynamo/test_exc.py::ExcTests::test_unsupported_real_stack 2025-12-04T12:20:43.7566657Z 2025-12-04T12:20:43.7566850Z Finished dynamo/test_exc 1/1 ... [2025-12-04 12:20:43.755525][12483.683814946], took 0.10min 2025-12-04T12:20:43.7790662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_exc/dynamo.test_exc-59dcc92175511a1b.xml 2025-12-04T12:20:43.8117563Z Running inductor/test_aot_inductor_arrayref 1/2 ... [2025-12-04 12:20:43.811483][12483.739781926] 2025-12-04T12:20:43.8118091Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:20:43.8120869Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_arrayref.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:43.811808] 2025-12-04T12:26:56.1965797Z 2025-12-04T12:26:56.1967388Z inductor/test_aot_inductor_arrayref 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_arrayref_1.2_f8b4577ce160ed2e_.log 2025-12-04T12:26:56.2049360Z Running 159 items in this shard: test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_amp_fallback_random_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_constant_tensor_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_constant_tensor_name_collision_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_fp8_dtype_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_sym_inputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_user_defined_triton_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printing_model_inputs_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_profiler_enable_kernel_profile_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_profiler_enable_kernel_profile_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_runtime_asserts_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_autotune_int64_user_defined_triton_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_backward_no_op_logging_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_bool_input_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_3_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_reuse_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_clamp_decomposition_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_codegen_int_array_var_fix_memory_leak_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_composed_dynamic_size_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_cpu_predicate_cuda_operands_max_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_mismatched_branch_output_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_nested_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_non_tensor_predicates_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_share_predicate_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_symint_input_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_unbacked_symint_closure_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_use_buffers_from_outer_scope_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_with_reinterpret_view_inputs_outputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_consecutive_compiles_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_constant_folding_with_update_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_conv3d_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_convolution_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_custom_op_in_subgraph_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_d2h_copy_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_deconv_freezing_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_device_moved_constant_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_duplicated_params_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_dynamic_scalar_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_dynamic_smem_above_default_limit_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_empty_cat_dtype_promotion_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_empty_constant_folding_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fake_tensor_device_validation_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fallback_mem_leak_fix_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fill__fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_foreach_multiple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fqn_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_free_inactive_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_index_put_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_dynamic_dim_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_grid_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_mmaped_weights_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_libtorch_free_so_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_linear_dynamic_maxautotune_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_missing_cubin_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_missing_output_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_mixed_device_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_model_modified_weights_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_multi_device_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_nan_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_nested_tensor_from_jagged_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_non_default_gpu_device_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_none_args_aot_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_normal_functional_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_on_gpu_device1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_misaligned_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_path_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_path_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_poi_multiple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_profile_benchmark_harness_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_proxy_executor_permute_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_quanatized_int8_linear_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_quantized_linear_bias_none_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_replace_unbacked_symbol_with_backed_expr_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_replicate_on_devices_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_return_view_constant_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_reuse_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_complex_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_dtype_failed_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_fp8_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_large_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_shape_failed_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_same_backing_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scaled_grouped_mm_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scatter_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scatter_reduce_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sdpa_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sdpa_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_seq_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_False_max_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_True_max_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_True_max_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_size_from_multi_output_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_size_with_unbacked_add_expr_transitive_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_stride_with_unbacked_expr_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_subclasses_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sym_i64_input_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_symint_item_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sympy_cpp_printer_min_max_minmax0_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_autotuning_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_bool_param_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_dynamic_grid_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_extern_kernel_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_reinterpret_view_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_sympy_expr_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_mutated_autotuning_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbounded_expr_substitutions_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_update_inactive_constant_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_using_model_name_for_files_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_weight_on_disk_legacy_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_conv_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_mixed_device_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_outer_buffers_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_pytree_inputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_sym_expr_cond_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_sym_expr_cond_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_grid_with_unbacked_symbols_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_size_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_size_weight_cpu_with_stack_allocation 2025-12-04T12:26:56.2126416Z 2025-12-04T12:26:56.2126674Z Finished inductor/test_aot_inductor_arrayref 1/2 ... [2025-12-04 12:26:56.196856][12856.125150635], took 6.21min 2025-12-04T12:26:56.2217878Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_arrayref/inductor.test_aot_inductor_arrayref-c35059cecd7c3b99.xml 2025-12-04T12:26:56.7595323Z Uploading artifacts took 0.45 seconds 2025-12-04T12:26:56.7597438Z Running inductor/test_halide 1/1 ... [2025-12-04 12:26:56.759534][12856.687830874] 2025-12-04T12:26:56.7597853Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:26:56.7600930Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_halide.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:26:56.759853] 2025-12-04T12:27:02.5066785Z 2025-12-04T12:27:02.5067632Z inductor/test_halide 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_halide_1.1_2220e392a986bf8c_.log 2025-12-04T12:27:02.5068239Z 2025-12-04T12:27:02.5068520Z Finished inductor/test_halide 1/1 ... [2025-12-04 12:27:02.506415][12862.434712532], took 0.10min 2025-12-04T12:27:02.5301946Z Running inductor/test_deterministic 1/3 ... [2025-12-04 12:27:02.529936][12862.458235174] 2025-12-04T12:27:02.5302440Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:27:02.5304887Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:27:02.530231] 2025-12-04T12:30:09.8791322Z 2025-12-04T12:30:09.8792247Z inductor/test_deterministic 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_deterministic_1.3_31baf2894918a3e4_.log 2025-12-04T12:30:09.8800484Z Running 15 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_max_autotune_deterministic_True, test/inductor/test_deterministic.py::DeterministicTest::test_mm_padding_deterministic_False, test/inductor/test_deterministic.py::DeterministicTest::test_mm_padding_deterministic_True, test/inductor/test_deterministic.py::DeterministicTest::test_reduction_coordesc_tuning_deterministic_False, test/inductor/test_deterministic.py::DeterministicTest::test_reduction_coordesc_tuning_deterministic_True, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_inference_precision_float16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_training_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_training_precision_bfloat16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_training_precision_float32, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_float16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_float32, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_bfloat16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_training_precision_float32 2025-12-04T12:30:09.8806831Z 2025-12-04T12:30:09.8807086Z Finished inductor/test_deterministic 1/3 ... [2025-12-04 12:30:09.878780][13049.807070703], took 3.12min 2025-12-04T12:30:09.9030727Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccee7b90c33901e0.xml 2025-12-04T12:30:09.9869459Z Running dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 12:30:09.986683][13049.914981784] 2025-12-04T12:30:09.9869951Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:30:09.9873184Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_deque_reconstruct.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:09.987035] 2025-12-04T12:30:13.7582095Z 2025-12-04T12:30:13.7582981Z dynamo/test_deque_reconstruct 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_deque_reconstruct_1.1_5be42757c28bbc33_.log 2025-12-04T12:30:13.7584858Z Running 3 items in this shard: test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_not_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_shallows_globals 2025-12-04T12:30:13.7586090Z 2025-12-04T12:30:13.7586389Z Finished dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 12:30:13.757844][13053.686136317], took 0.06min 2025-12-04T12:30:13.7811259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-4527efee43b2418d.xml 2025-12-04T12:30:13.8147987Z Running inductor/test_inductor_annotations 1/1 ... [2025-12-04 12:30:13.814558][13053.742857429] 2025-12-04T12:30:13.8148506Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:30:13.8151574Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_annotations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:13.814899] 2025-12-04T12:30:22.8954768Z 2025-12-04T12:30:22.8955772Z inductor/test_inductor_annotations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_annotations_1.1_acd21ad590bd7056_.log 2025-12-04T12:30:22.8957307Z Running 2 items in this shard: test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_no_annotations, test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_training_annotation 2025-12-04T12:30:22.8958143Z 2025-12-04T12:30:22.8958467Z Finished inductor/test_inductor_annotations 1/1 ... [2025-12-04 12:30:22.895073][13062.823365282], took 0.15min 2025-12-04T12:30:22.9186611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-1bfa13dfa66ba37a.xml 2025-12-04T12:30:23.0183258Z Running inductor/test_compile_worker 1/1 ... [2025-12-04 12:30:23.018074][13062.946373264] 2025-12-04T12:30:23.0183732Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:30:23.0186832Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_worker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:23.018380] 2025-12-04T12:31:11.0663471Z 2025-12-04T12:31:11.0665974Z inductor/test_compile_worker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_worker_1.1_22d83b26da38be9a_.log 2025-12-04T12:31:11.0673717Z Running 16 items in this shard: test/inductor/test_compile_worker.py::TestCompileWorker::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorker::test_crash, test/inductor/test_compile_worker.py::TestCompileWorker::test_exception, test/inductor/test_compile_worker.py::TestCompileWorker::test_logging, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_crash, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_exception, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_logging, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestTimer::test_basics, test/inductor/test_compile_worker.py::TestTimer::test_never_fires, test/inductor/test_compile_worker.py::TestTimer::test_repeated_calls, test/inductor/test_compile_worker.py::TestTimer::test_spammy_calls 2025-12-04T12:31:11.0680825Z 2025-12-04T12:31:11.0681244Z Finished inductor/test_compile_worker 1/1 ... [2025-12-04 12:31:11.065936][13110.994229199], took 0.80min 2025-12-04T12:31:11.0914856Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-d291715b5fb08603.xml 2025-12-04T12:31:11.1802949Z Running dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 12:31:11.180032][13111.108330738] 2025-12-04T12:31:11.1803441Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:31:11.1806121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_passes_pre_grad.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:31:11.180340] 2025-12-04T12:31:15.5021022Z 2025-12-04T12:31:15.5021983Z dynamo/test_fx_passes_pre_grad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_2cb42dcf29e7ede2_.log 2025-12-04T12:31:15.5023055Z Running 1 items in this shard: test/dynamo/test_fx_passes_pre_grad.py::FxPassesPreGradTests::test_pass_execution_and_save 2025-12-04T12:31:15.5023535Z 2025-12-04T12:31:15.5023882Z Finished dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 12:31:15.501784][13115.430081399], took 0.07min 2025-12-04T12:31:15.5262011Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-8f5f76cf24e3a322.xml 2025-12-04T12:31:15.5590830Z Running inductor/test_fp8 1/1 ... [2025-12-04 12:31:15.558848][13115.487148224] 2025-12-04T12:31:15.5591260Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:31:15.5594074Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:31:15.559149] 2025-12-04T12:33:36.6389122Z 2025-12-04T12:33:36.6390486Z PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_041887d0b8d7fee8_.log) 2025-12-04T12:33:36.6392832Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml 2025-12-04T12:33:36.6394729Z ============================= test session starts ============================== 2025-12-04T12:33:36.6395852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:33:36.6396806Z cachedir: .pytest_cache 2025-12-04T12:33:36.6397690Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:33:36.6398477Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:33:36.6398874Z configfile: pytest.ini 2025-12-04T12:33:36.6399616Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:33:36.6400472Z collecting ... collected 188 items 2025-12-04T12:33:36.6400798Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:33:36.6469652Z Running 188 items in this shard: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda 2025-12-04T12:33:36.6535388Z 2025-12-04T12:33:36.6535727Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda PASSED [1.6693s] [ 0%] 2025-12-04T12:33:36.6536411Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda PASSED [0.2223s] [ 1%] 2025-12-04T12:33:36.6537097Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda PASSED [0.6143s] [ 1%] 2025-12-04T12:33:36.6537766Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda PASSED [0.2459s] [ 2%] 2025-12-04T12:33:36.6538442Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.6223s] [ 2%] 2025-12-04T12:33:36.6539111Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [0.4404s] [ 3%] 2025-12-04T12:33:36.6539759Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.4560s] [ 3%] 2025-12-04T12:33:36.6540406Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.4745s] [ 4%] 2025-12-04T12:33:36.6541057Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.4719s] [ 4%] 2025-12-04T12:33:36.6541732Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6369s] [ 5%] 2025-12-04T12:33:36.6542366Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda PASSED [0.4181s] [ 5%] 2025-12-04T12:33:36.6542974Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda PASSED [0.4264s] [ 6%] 2025-12-04T12:33:36.6543579Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda PASSED [0.5481s] [ 6%] 2025-12-04T12:33:36.6544186Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda PASSED [0.5481s] [ 7%] 2025-12-04T12:33:36.6544797Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.4587s] [ 7%] 2025-12-04T12:33:36.6545393Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [0.1772s] [ 8%] 2025-12-04T12:33:36.6545981Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.1826s] [ 9%] 2025-12-04T12:33:36.6546585Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.2065s] [ 9%] 2025-12-04T12:33:36.6547340Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2018s] [ 10%] 2025-12-04T12:33:36.6547937Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.2166s] [ 10%] 2025-12-04T12:33:36.6549127Z inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] Error in codegen for ComputedBuffer(name='buf0', layout=FixedLayout('cuda:0', torch.float8_e5m2, size=[s77, s77, s77], stride=[s77**2, s77, 1]), data=Pointwise( 2025-12-04T12:33:36.6550143Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] 'cuda', 2025-12-04T12:33:36.6550641Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] torch.float8_e5m2, 2025-12-04T12:33:36.6551166Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] def inner_fn(index): 2025-12-04T12:33:36.6551713Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] i0, i1, i2 = index 2025-12-04T12:33:36.6552325Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] tmp0 = ops.load(arg1_1, i2 + i0 * s77**2 + i1 * s77) 2025-12-04T12:33:36.6553063Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] tmp1 = ops.to_dtype(tmp0, torch.float8_e5m2, src_dtype=torch.float8_e4m3fn) 2025-12-04T12:33:36.6553722Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] return tmp1 2025-12-04T12:33:36.6554199Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] , 2025-12-04T12:33:36.6554700Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] ranges=[s77, s77, s77], 2025-12-04T12:33:36.6555283Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] origin_node=convert_element_type, 2025-12-04T12:33:36.6555932Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] origins=OrderedSet([convert_element_type]), 2025-12-04T12:33:36.6556518Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] stack_traces = {, 2025-12-04T12:33:36.6557233Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 164, in fp8_cast, 2025-12-04T12:33:36.6557984Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] return x.to(dtype=dtype), 2025-12-04T12:33:36.6558492Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] , 2025-12-04T12:33:36.6558919Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] } 2025-12-04T12:33:36.6559590Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] ), _split_size=None, _original_inner_fn=None, _original_ranges=None, _original_reduction_ranges=None) 2025-12-04T12:33:36.6560732Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] Error in codegen for ComputedBuffer(name='buf0', layout=FixedLayout('cuda:0', torch.float8_e4m3fn, size=[s77, s77, s77], stride=[s77**2, s77, 1]), data=Pointwise( 2025-12-04T12:33:36.6561580Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] 'cuda', 2025-12-04T12:33:36.6562083Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] torch.float8_e4m3fn, 2025-12-04T12:33:36.6562622Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] def inner_fn(index): 2025-12-04T12:33:36.6563158Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] i0, i1, i2 = index 2025-12-04T12:33:36.6563803Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] tmp0 = ops.load(arg1_1, i2 + i0 * s77**2 + i1 * s77) 2025-12-04T12:33:36.6564589Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] tmp1 = ops.to_dtype(tmp0, torch.float8_e4m3fn, src_dtype=torch.float8_e5m2) 2025-12-04T12:33:36.6565235Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] return tmp1 2025-12-04T12:33:36.6565733Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] , 2025-12-04T12:33:36.6566262Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] ranges=[s77, s77, s77], 2025-12-04T12:33:36.6566838Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] origin_node=convert_element_type, 2025-12-04T12:33:36.6567470Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] origins=OrderedSet([convert_element_type]), 2025-12-04T12:33:36.6568048Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] stack_traces = {, 2025-12-04T12:33:36.6568755Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 164, in fp8_cast, 2025-12-04T12:33:36.6569477Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] return x.to(dtype=dtype), 2025-12-04T12:33:36.6569987Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] , 2025-12-04T12:33:36.6570420Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] } 2025-12-04T12:33:36.6571085Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] ), _split_size=None, _original_inner_fn=None, _original_ranges=None, _original_reduction_ranges=None) 2025-12-04T12:33:36.6571642Z PASSED [0.4772s] [ 11%] 2025-12-04T12:33:36.6572287Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 W1204 12:31:30.420000 166614 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:33:36.6572936Z PASSED [1.0573s] [ 11%] 2025-12-04T12:33:36.6573340Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 ('RERUN', {'yellow': True}) [1.0094s] [ 12%] 2025-12-04T12:33:36.6574005Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 ('RERUN', {'yellow': True}) [0.9918s] [ 12%] 2025-12-04T12:33:36.6574611Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 FAILED [1.0006s] [ 12%] 2025-12-04T12:33:36.6574933Z 2025-12-04T12:33:36.6575044Z ==================================== RERUNS ==================================== 2025-12-04T12:33:36.6575380Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6575707Z Traceback (most recent call last): 2025-12-04T12:33:36.6576174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6576627Z method(*args, **kwargs) 2025-12-04T12:33:36.6577044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6577488Z method(*args, **kwargs) 2025-12-04T12:33:36.6577899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6578331Z with policy(): 2025-12-04T12:33:36.6578736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6579179Z raise RuntimeError(msg) 2025-12-04T12:33:36.6579962Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6580770Z 2025-12-04T12:33:36.6580910Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6581422Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6581858Z 2025-12-04T12:33:36.6582023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6582441Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6582725Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6582970Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6583342Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6584164Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6584853Z graph_break [] 2025-12-04T12:33:36.6585154Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6585596Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6585922Z Traceback (most recent call last): 2025-12-04T12:33:36.6586377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6586822Z method(*args, **kwargs) 2025-12-04T12:33:36.6587243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6587676Z method(*args, **kwargs) 2025-12-04T12:33:36.6588089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6588541Z with policy(): 2025-12-04T12:33:36.6588951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6589390Z raise RuntimeError(msg) 2025-12-04T12:33:36.6590152Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6590893Z 2025-12-04T12:33:36.6591024Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6591532Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6591909Z 2025-12-04T12:33:36.6592070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6592442Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6592729Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6592958Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6593313Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6594130Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6594838Z graph_break [] 2025-12-04T12:33:36.6595128Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6595537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6595818Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6596039Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6596443Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6597320Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6598042Z graph_break [] 2025-12-04T12:33:36.6598364Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6598741Z =================================== FAILURES =================================== 2025-12-04T12:33:36.6599098Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6599424Z Traceback (most recent call last): 2025-12-04T12:33:36.6599882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6600406Z method(*args, **kwargs) 2025-12-04T12:33:36.6600845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6601294Z method(*args, **kwargs) 2025-12-04T12:33:36.6601726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6602166Z with policy(): 2025-12-04T12:33:36.6602572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6603016Z raise RuntimeError(msg) 2025-12-04T12:33:36.6603789Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6604530Z 2025-12-04T12:33:36.6604664Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6605183Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6605566Z 2025-12-04T12:33:36.6605729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6606104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6606390Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6606616Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6606977Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6607798Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6608484Z graph_break [] 2025-12-04T12:33:36.6608774Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6609185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6609469Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6609698Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6610055Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6610873Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6611567Z graph_break [] 2025-12-04T12:33:36.6611854Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6612354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6612630Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6612854Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6613244Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6614102Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6614801Z graph_break [] 2025-12-04T12:33:36.6615087Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6615743Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml - 2025-12-04T12:33:36.6616329Z =========================== short test summary info ============================ 2025-12-04T12:33:36.6617656Z FAILED [1.0006s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6618660Z 2025-12-04T12:33:36.6618795Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6619311Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6619696Z 2025-12-04T12:33:36.6619857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6620220Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:33:36.6620531Z ==================== 1 failed, 22 passed, 2 rerun in 13.85s ==================== 2025-12-04T12:33:36.6620789Z Got exit code 1 2025-12-04T12:33:36.6620953Z Retrying single test... 2025-12-04T12:33:36.6621343Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml 2025-12-04T12:33:36.6621801Z ============================= test session starts ============================== 2025-12-04T12:33:36.6622191Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:33:36.6622546Z cachedir: .pytest_cache 2025-12-04T12:33:36.6622967Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:33:36.6623423Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:33:36.6623631Z configfile: pytest.ini 2025-12-04T12:33:36.6624085Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:33:36.6624654Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:33:36.6625212Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6625709Z Running 1 items in this shard 2025-12-04T12:33:36.6625840Z 2025-12-04T12:33:36.6626384Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:31:41.862994488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6626985Z 2025-12-04T12:33:36.6627295Z [W1204 12:31:50.512756638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6627672Z 2025-12-04T12:33:36.6628050Z [W1204 12:31:50.512960432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6628476Z 2025-12-04T12:33:36.6628784Z [W1204 12:31:50.515354145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6629213Z 2025-12-04T12:33:36.6629509Z [W1204 12:31:50.515501617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6629882Z 2025-12-04T12:33:36.6630217Z [W1204 12:31:50.517340580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6630588Z 2025-12-04T12:33:36.6630889Z [W1204 12:31:50.517599655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6631255Z 2025-12-04T12:33:36.6631560Z [W1204 12:31:50.517727217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6631940Z 2025-12-04T12:33:36.6632230Z [W1204 12:31:50.518410819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6632603Z 2025-12-04T12:33:36.6632897Z [W1204 12:31:50.518539601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6633271Z 2025-12-04T12:33:36.6633568Z [W1204 12:31:50.518977349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6633938Z 2025-12-04T12:33:36.6634237Z [W1204 12:31:50.519106742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6634607Z 2025-12-04T12:33:36.6634901Z [W1204 12:31:50.519414497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6635275Z 2025-12-04T12:33:36.6635564Z [W1204 12:31:50.519541719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6635952Z 2025-12-04T12:33:36.6636244Z [W1204 12:31:50.519822724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6636622Z 2025-12-04T12:33:36.6636915Z [W1204 12:31:50.519947877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6637283Z 2025-12-04T12:33:36.6637580Z [W1204 12:31:50.520273063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6637950Z 2025-12-04T12:33:36.6638246Z [W1204 12:31:50.520423545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6638626Z 2025-12-04T12:33:36.6638911Z W1204 12:31:50.848000 167730 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:33:36.6639558Z [W1204 12:31:51.579566518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6639945Z 2025-12-04T12:33:36.6640331Z [W1204 12:31:51.579883053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6640702Z 2025-12-04T12:33:36.6641000Z [W1204 12:31:51.580040686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6641367Z 2025-12-04T12:33:36.6641663Z [W1204 12:31:51.580432262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6642030Z 2025-12-04T12:33:36.6642378Z [W1204 12:31:51.580566685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6642803Z 2025-12-04T12:33:36.6643096Z [W1204 12:31:51.580811919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6643509Z 2025-12-04T12:33:36.6643803Z [W1204 12:31:51.581020863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6644206Z 2025-12-04T12:33:36.6644507Z [W1204 12:31:51.581148295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6644876Z 2025-12-04T12:33:36.6645174Z [W1204 12:31:51.581457830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6645544Z 2025-12-04T12:33:36.6645836Z [W1204 12:31:51.581586652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6646225Z 2025-12-04T12:33:36.6646518Z [W1204 12:31:51.581889158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6646899Z 2025-12-04T12:33:36.6647191Z [W1204 12:31:51.582018500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6647567Z 2025-12-04T12:33:36.6647863Z [W1204 12:31:51.582289445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6648230Z 2025-12-04T12:33:36.6648533Z [W1204 12:31:51.582415297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6648901Z 2025-12-04T12:33:36.6649213Z [W1204 12:31:51.582670111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6649587Z 2025-12-04T12:33:36.6649883Z [W1204 12:31:51.582794043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6650260Z 2025-12-04T12:33:36.6650556Z [W1204 12:31:51.583048118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6650935Z 2025-12-04T12:33:36.6651230Z [W1204 12:31:51.583175210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6651600Z 2025-12-04T12:33:36.6651691Z ('RERUN', {'yellow': True}) [11.2691s] [100%] 2025-12-04T12:33:36.6652374Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:31:52.229000514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6652982Z 2025-12-04T12:33:36.6653279Z [W1204 12:31:52.229279409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6653652Z 2025-12-04T12:33:36.6653942Z [W1204 12:31:52.229416141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6654318Z 2025-12-04T12:33:36.6654612Z [W1204 12:31:52.229788888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6654981Z 2025-12-04T12:33:36.6655279Z [W1204 12:31:52.229922650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6655648Z 2025-12-04T12:33:36.6655943Z [W1204 12:31:52.230198395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6656313Z 2025-12-04T12:33:36.6656655Z [W1204 12:31:52.230407758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6657068Z 2025-12-04T12:33:36.6657357Z [W1204 12:31:52.230526210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6657771Z 2025-12-04T12:33:36.6658115Z [W1204 12:31:52.230837466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6658556Z 2025-12-04T12:33:36.6659039Z [W1204 12:31:52.230966008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6659441Z 2025-12-04T12:33:36.6659746Z [W1204 12:31:52.231267723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6660123Z 2025-12-04T12:33:36.6660425Z [W1204 12:31:52.231393465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6660811Z 2025-12-04T12:33:36.6661101Z [W1204 12:31:52.231661020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6661478Z 2025-12-04T12:33:36.6661770Z [W1204 12:31:52.231784352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6662138Z 2025-12-04T12:33:36.6662437Z [W1204 12:31:52.232053847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6662804Z 2025-12-04T12:33:36.6663101Z [W1204 12:31:52.232176649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6663471Z 2025-12-04T12:33:36.6663764Z [W1204 12:31:52.232447653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6664154Z 2025-12-04T12:33:36.6664450Z [W1204 12:31:52.232569575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6664825Z 2025-12-04T12:33:36.6665113Z [W1204 12:31:52.873924561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6665481Z 2025-12-04T12:33:36.6665777Z [W1204 12:31:52.874201396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6666145Z 2025-12-04T12:33:36.6666445Z [W1204 12:31:52.874333238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6666815Z 2025-12-04T12:33:36.6667106Z [W1204 12:31:52.874700785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6667483Z 2025-12-04T12:33:36.6667775Z [W1204 12:31:52.874833467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6668153Z 2025-12-04T12:33:36.6668448Z [W1204 12:31:52.875075051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6668837Z 2025-12-04T12:33:36.6669140Z [W1204 12:31:52.875281324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6669507Z 2025-12-04T12:33:36.6669804Z [W1204 12:31:52.875408187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6670174Z 2025-12-04T12:33:36.6670464Z [W1204 12:31:52.875719212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6670932Z 2025-12-04T12:33:36.6671239Z [W1204 12:31:52.875847714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6671616Z 2025-12-04T12:33:36.6671908Z [W1204 12:31:52.876154200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6672321Z 2025-12-04T12:33:36.6672672Z [W1204 12:31:52.876283882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6673042Z 2025-12-04T12:33:36.6673341Z [W1204 12:31:52.876561817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6673708Z 2025-12-04T12:33:36.6674006Z [W1204 12:31:52.876690239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6674378Z 2025-12-04T12:33:36.6674671Z [W1204 12:31:52.876946863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6675046Z 2025-12-04T12:33:36.6675337Z [W1204 12:31:52.877072726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6675715Z 2025-12-04T12:33:36.6676009Z [W1204 12:31:52.877332280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6676377Z 2025-12-04T12:33:36.6676674Z [W1204 12:31:52.877455782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6677055Z 2025-12-04T12:33:36.6677151Z ('RERUN', {'yellow': True}) [1.3607s] [100%] 2025-12-04T12:33:36.6677842Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:31:53.591663956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6678449Z 2025-12-04T12:33:36.6678743Z [W1204 12:31:53.591935161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6679125Z 2025-12-04T12:33:36.6679418Z [W1204 12:31:53.592066183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6679790Z 2025-12-04T12:33:36.6680208Z [W1204 12:31:53.592443970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6680579Z 2025-12-04T12:33:36.6680880Z [W1204 12:31:53.592574422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6681249Z 2025-12-04T12:33:36.6681543Z [W1204 12:31:53.592822476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6681931Z 2025-12-04T12:33:36.6682226Z [W1204 12:31:53.593022900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6682603Z 2025-12-04T12:33:36.6682896Z [W1204 12:31:53.593144812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6683267Z 2025-12-04T12:33:36.6683566Z [W1204 12:31:53.593450937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6683936Z 2025-12-04T12:33:36.6684233Z [W1204 12:31:53.593579739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6684605Z 2025-12-04T12:33:36.6684955Z [W1204 12:31:53.593888295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6685377Z 2025-12-04T12:33:36.6685669Z [W1204 12:31:53.594013927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6686080Z 2025-12-04T12:33:36.6686375Z [W1204 12:31:53.594281641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6686742Z 2025-12-04T12:33:36.6687071Z [W1204 12:31:53.594407573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6687442Z 2025-12-04T12:33:36.6687737Z [W1204 12:31:53.594660678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6688107Z 2025-12-04T12:33:36.6688400Z [W1204 12:31:53.594783290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6688778Z 2025-12-04T12:33:36.6689077Z [W1204 12:31:53.595041815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6689453Z 2025-12-04T12:33:36.6689747Z [W1204 12:31:53.595166257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6690116Z 2025-12-04T12:33:36.6690415Z [W1204 12:31:54.391603648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6690786Z 2025-12-04T12:33:36.6691083Z [W1204 12:31:54.391898333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6691450Z 2025-12-04T12:33:36.6691742Z [W1204 12:31:54.392036466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6692119Z 2025-12-04T12:33:36.6692411Z [W1204 12:31:54.392408172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6692783Z 2025-12-04T12:33:36.6693073Z [W1204 12:31:54.392540764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6693443Z 2025-12-04T12:33:36.6693743Z [W1204 12:31:54.392782868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6694203Z 2025-12-04T12:33:36.6694662Z [W1204 12:31:54.392986432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6695039Z 2025-12-04T12:33:36.6695329Z [W1204 12:31:54.393111974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6695702Z 2025-12-04T12:33:36.6696015Z [W1204 12:31:54.393427140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6696388Z 2025-12-04T12:33:36.6696678Z [W1204 12:31:54.393556732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6697063Z 2025-12-04T12:33:36.6697359Z [W1204 12:31:54.393856737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6697729Z 2025-12-04T12:33:36.6698027Z [W1204 12:31:54.393984509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6698397Z 2025-12-04T12:33:36.6698695Z [W1204 12:31:54.394246924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6699063Z 2025-12-04T12:33:36.6699410Z [W1204 12:31:54.394373356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6699823Z 2025-12-04T12:33:36.6700114Z [W1204 12:31:54.394625060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6700521Z 2025-12-04T12:33:36.6700845Z [W1204 12:31:54.394746863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6701217Z 2025-12-04T12:33:36.6701515Z [W1204 12:31:54.394999607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6701887Z 2025-12-04T12:33:36.6702187Z [W1204 12:31:54.395125199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6702565Z 2025-12-04T12:33:36.6702631Z FAILED [1.4154s] [100%] 2025-12-04T12:33:36.6702749Z 2025-12-04T12:33:36.6702842Z ==================================== RERUNS ==================================== 2025-12-04T12:33:36.6703187Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6703504Z Traceback (most recent call last): 2025-12-04T12:33:36.6703972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6704425Z method(*args, **kwargs) 2025-12-04T12:33:36.6704860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6705313Z method(*args, **kwargs) 2025-12-04T12:33:36.6705743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6706188Z with policy(): 2025-12-04T12:33:36.6706609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6707056Z raise RuntimeError(msg) 2025-12-04T12:33:36.6707832Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 230686720 and is now 268435456. 2025-12-04T12:33:36.6708566Z 2025-12-04T12:33:36.6708709Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6709231Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6709609Z 2025-12-04T12:33:36.6709773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6710155Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6710447Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6710680Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6711427Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6712252Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6712579Z graph_break [] 2025-12-04T12:33:36.6712869Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6713274Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:33:36.6714229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:33:36.6715105Z if out == self.unknown_value: 2025-12-04T12:33:36.6715403Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6715719Z Traceback (most recent call last): 2025-12-04T12:33:36.6716226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6716682Z method(*args, **kwargs) 2025-12-04T12:33:36.6717493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6717966Z method(*args, **kwargs) 2025-12-04T12:33:36.6718411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6718853Z with policy(): 2025-12-04T12:33:36.6719266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6719725Z raise RuntimeError(msg) 2025-12-04T12:33:36.6720564Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6721291Z 2025-12-04T12:33:36.6721425Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6721965Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6722350Z 2025-12-04T12:33:36.6722513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6722893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6723172Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6723404Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6724152Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6724957Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6725281Z graph_break [] 2025-12-04T12:33:36.6725578Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6725987Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:33:36.6726919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:33:36.6727765Z if out == self.unknown_value: 2025-12-04T12:33:36.6728042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6728326Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6728554Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6728927Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6729743Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6730434Z graph_break [] 2025-12-04T12:33:36.6730722Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6731085Z =================================== FAILURES =================================== 2025-12-04T12:33:36.6731500Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6731885Z Traceback (most recent call last): 2025-12-04T12:33:36.6732347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6732856Z method(*args, **kwargs) 2025-12-04T12:33:36.6733287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6733772Z method(*args, **kwargs) 2025-12-04T12:33:36.6734195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6734645Z with policy(): 2025-12-04T12:33:36.6735045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6735489Z raise RuntimeError(msg) 2025-12-04T12:33:36.6736257Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6736979Z 2025-12-04T12:33:36.6737110Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6737638Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6738015Z 2025-12-04T12:33:36.6738176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6738549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6738829Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6739057Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6739780Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6740595Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6740919Z graph_break [] 2025-12-04T12:33:36.6741207Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6741601Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:33:36.6742507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:33:36.6743344Z if out == self.unknown_value: 2025-12-04T12:33:36.6743599Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6743884Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6744111Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6744464Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6745279Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6745968Z graph_break [] 2025-12-04T12:33:36.6746261Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6746665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6746935Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6747162Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6747607Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6748419Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6749132Z graph_break [] 2025-12-04T12:33:36.6749461Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6750120Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml - 2025-12-04T12:33:36.6750674Z =========================== short test summary info ============================ 2025-12-04T12:33:36.6751793Z FAILED [1.4154s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6752791Z 2025-12-04T12:33:36.6752923Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6753439Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6753819Z 2025-12-04T12:33:36.6753985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6754324Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:33:36.6754637Z ================= 1 failed, 187 deselected, 2 rerun in 14.09s ================== 2025-12-04T12:33:36.6754906Z Got exit code 1 2025-12-04T12:33:36.6755072Z Retrying single test... 2025-12-04T12:33:36.6755475Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml 2025-12-04T12:33:36.6755937Z ============================= test session starts ============================== 2025-12-04T12:33:36.6756329Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:33:36.6756678Z cachedir: .pytest_cache 2025-12-04T12:33:36.6757095Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:33:36.6757561Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:33:36.6757774Z configfile: pytest.ini 2025-12-04T12:33:36.6758230Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:33:36.6758796Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:33:36.6759364Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6759854Z Running 1 items in this shard 2025-12-04T12:33:36.6760071Z 2025-12-04T12:33:36.6760609Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:32:02.355911129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6761218Z 2025-12-04T12:33:36.6761522Z [W1204 12:32:11.539936289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6761907Z 2025-12-04T12:33:36.6762201Z [W1204 12:32:11.540171263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6762580Z 2025-12-04T12:33:36.6762881Z [W1204 12:32:11.542493394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6763313Z 2025-12-04T12:33:36.6763648Z [W1204 12:32:11.542636526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6764017Z 2025-12-04T12:33:36.6764308Z [W1204 12:32:11.544468648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6764719Z 2025-12-04T12:33:36.6765053Z [W1204 12:32:11.544727433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6765430Z 2025-12-04T12:33:36.6765718Z [W1204 12:32:11.544858445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6766087Z 2025-12-04T12:33:36.6766384Z [W1204 12:32:11.545241531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6766754Z 2025-12-04T12:33:36.6767056Z [W1204 12:32:11.545371474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6767422Z 2025-12-04T12:33:36.6767725Z [W1204 12:32:11.545802361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6768103Z 2025-12-04T12:33:36.6768394Z [W1204 12:32:11.545930323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6768769Z 2025-12-04T12:33:36.6769059Z [W1204 12:32:11.546239809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6769429Z 2025-12-04T12:33:36.6769724Z [W1204 12:32:11.546368511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6770091Z 2025-12-04T12:33:36.6770398Z [W1204 12:32:11.546639236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6770771Z 2025-12-04T12:33:36.6771063Z [W1204 12:32:11.546763178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6771439Z 2025-12-04T12:33:36.6771731Z [W1204 12:32:11.547039163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6772104Z 2025-12-04T12:33:36.6772396Z [W1204 12:32:11.547161705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6772766Z 2025-12-04T12:33:36.6773052Z W1204 12:32:11.875000 168209 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:33:36.6773712Z [W1204 12:32:12.641261482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6774088Z 2025-12-04T12:33:36.6774381Z [W1204 12:32:12.641588608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6774764Z 2025-12-04T12:33:36.6775063Z [W1204 12:32:12.641724280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6775447Z 2025-12-04T12:33:36.6775742Z [W1204 12:32:12.642103397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6776115Z 2025-12-04T12:33:36.6776413Z [W1204 12:32:12.642234109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6776783Z 2025-12-04T12:33:36.6777124Z [W1204 12:32:12.642480253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6777554Z 2025-12-04T12:33:36.6777856Z [W1204 12:32:12.642715317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6778235Z 2025-12-04T12:33:36.6778567Z [W1204 12:32:12.642841750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6778942Z 2025-12-04T12:33:36.6779271Z [W1204 12:32:12.643157025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6779641Z 2025-12-04T12:33:36.6779939Z [W1204 12:32:12.643285367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6780305Z 2025-12-04T12:33:36.6780605Z [W1204 12:32:12.643600583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6780981Z 2025-12-04T12:33:36.6781275Z [W1204 12:32:12.643725795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6781654Z 2025-12-04T12:33:36.6781947Z [W1204 12:32:12.643995310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6782327Z 2025-12-04T12:33:36.6782621Z [W1204 12:32:12.644118202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6782990Z 2025-12-04T12:33:36.6783290Z [W1204 12:32:12.644386106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6783663Z 2025-12-04T12:33:36.6783959Z [W1204 12:32:12.644507579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6784334Z 2025-12-04T12:33:36.6784631Z [W1204 12:32:12.644764583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6785009Z 2025-12-04T12:33:36.6785308Z [W1204 12:32:12.644884705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6785698Z 2025-12-04T12:33:36.6785784Z ('RERUN', {'yellow': True}) [11.8591s] [100%] 2025-12-04T12:33:36.6786494Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:32:13.291398934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6787094Z 2025-12-04T12:33:36.6787395Z [W1204 12:32:13.291673739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6787770Z 2025-12-04T12:33:36.6788066Z [W1204 12:32:13.291802321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6788441Z 2025-12-04T12:33:36.6788732Z [W1204 12:32:13.292171368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6789107Z 2025-12-04T12:33:36.6789402Z [W1204 12:32:13.292308270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6789771Z 2025-12-04T12:33:36.6790065Z [W1204 12:32:13.292566764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6790430Z 2025-12-04T12:33:36.6790725Z [W1204 12:32:13.292769018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6791094Z 2025-12-04T12:33:36.6791431Z [W1204 12:32:13.292887450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6791852Z 2025-12-04T12:33:36.6792146Z [W1204 12:32:13.293206535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6792558Z 2025-12-04T12:33:36.6792858Z [W1204 12:32:13.293331308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6793237Z 2025-12-04T12:33:36.6793563Z [W1204 12:32:13.293640673 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6793931Z 2025-12-04T12:33:36.6794228Z [W1204 12:32:13.293765235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6794595Z 2025-12-04T12:33:36.6794888Z [W1204 12:32:13.294034340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6795265Z 2025-12-04T12:33:36.6795554Z [W1204 12:32:13.294156912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6795932Z 2025-12-04T12:33:36.6796222Z [W1204 12:32:13.294416456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6796597Z 2025-12-04T12:33:36.6796897Z [W1204 12:32:13.294535638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6797270Z 2025-12-04T12:33:36.6797569Z [W1204 12:32:13.294801713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6797939Z 2025-12-04T12:33:36.6798233Z [W1204 12:32:13.294921935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6798604Z 2025-12-04T12:33:36.6798894Z [W1204 12:32:14.953990260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6799267Z 2025-12-04T12:33:36.6799558Z [W1204 12:32:14.954268835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6799936Z 2025-12-04T12:33:36.6800297Z [W1204 12:32:14.954412437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6800666Z 2025-12-04T12:33:36.6800966Z [W1204 12:32:14.954782904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6801347Z 2025-12-04T12:33:36.6801646Z [W1204 12:32:14.954916226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6802014Z 2025-12-04T12:33:36.6802320Z [W1204 12:32:14.955159930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6802695Z 2025-12-04T12:33:36.6802986Z [W1204 12:32:14.955361604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6803361Z 2025-12-04T12:33:36.6803656Z [W1204 12:32:14.955485696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6804030Z 2025-12-04T12:33:36.6804326Z [W1204 12:32:14.955799142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6804693Z 2025-12-04T12:33:36.6804997Z [W1204 12:32:14.955931144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6805364Z 2025-12-04T12:33:36.6805707Z [W1204 12:32:14.956231759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6806115Z 2025-12-04T12:33:36.6806411Z [W1204 12:32:14.956368431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6806821Z 2025-12-04T12:33:36.6807144Z [W1204 12:32:14.956639466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6807512Z 2025-12-04T12:33:36.6807811Z [W1204 12:32:14.956764518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6808181Z 2025-12-04T12:33:36.6808481Z [W1204 12:32:14.957017083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6808853Z 2025-12-04T12:33:36.6809145Z [W1204 12:32:14.957139385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6809520Z 2025-12-04T12:33:36.6809813Z [W1204 12:32:14.957393479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6810184Z 2025-12-04T12:33:36.6810475Z [W1204 12:32:14.957513762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6810850Z 2025-12-04T12:33:36.6810938Z ('RERUN', {'yellow': True}) [1.3700s] [100%] 2025-12-04T12:33:36.6811626Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:32:14.663759956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6812223Z 2025-12-04T12:33:36.6812523Z [W1204 12:32:14.664035000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6812907Z 2025-12-04T12:33:36.6813200Z [W1204 12:32:14.664165683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6813575Z 2025-12-04T12:33:36.6813871Z [W1204 12:32:14.664548099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6814243Z 2025-12-04T12:33:36.6814542Z [W1204 12:32:14.664680671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6814913Z 2025-12-04T12:33:36.6815218Z [W1204 12:32:14.664930116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6815603Z 2025-12-04T12:33:36.6815894Z [W1204 12:32:14.665139090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6816274Z 2025-12-04T12:33:36.6816569Z [W1204 12:32:14.665261612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6816947Z 2025-12-04T12:33:36.6817412Z [W1204 12:32:14.665577057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6817791Z 2025-12-04T12:33:36.6818087Z [W1204 12:32:14.665706949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6818458Z 2025-12-04T12:33:36.6818758Z [W1204 12:32:14.666013265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6819128Z 2025-12-04T12:33:36.6819422Z [W1204 12:32:14.666139997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6819933Z 2025-12-04T12:33:36.6820229Z [W1204 12:32:14.666406761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6820604Z 2025-12-04T12:33:36.6820898Z [W1204 12:32:14.666530523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6821331Z 2025-12-04T12:33:36.6821674Z [W1204 12:32:14.666789618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6822048Z 2025-12-04T12:33:36.6822349Z [W1204 12:32:14.666911360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6822719Z 2025-12-04T12:33:36.6823026Z [W1204 12:32:14.667170594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6823391Z 2025-12-04T12:33:36.6823686Z [W1204 12:32:14.667293976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6824058Z 2025-12-04T12:33:36.6824349Z [W1204 12:32:15.388530850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6824721Z 2025-12-04T12:33:36.6825012Z [W1204 12:32:15.388827585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6825388Z 2025-12-04T12:33:36.6825683Z [W1204 12:32:15.388959288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6826050Z 2025-12-04T12:33:36.6826347Z [W1204 12:32:15.389329914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6826717Z 2025-12-04T12:33:36.6827010Z [W1204 12:32:15.389459516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6827383Z 2025-12-04T12:33:36.6827677Z [W1204 12:32:15.389698960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6828054Z 2025-12-04T12:33:36.6828347Z [W1204 12:32:15.389904694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6828713Z 2025-12-04T12:33:36.6829012Z [W1204 12:32:15.390056827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6829379Z 2025-12-04T12:33:36.6829677Z [W1204 12:32:15.390382852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6830044Z 2025-12-04T12:33:36.6830334Z [W1204 12:32:15.390509814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6830708Z 2025-12-04T12:33:36.6831001Z [W1204 12:32:15.390815710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6831387Z 2025-12-04T12:33:36.6831682Z [W1204 12:32:15.390942132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6832052Z 2025-12-04T12:33:36.6832349Z [W1204 12:32:15.391203537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6832720Z 2025-12-04T12:33:36.6833016Z [W1204 12:32:15.391327149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6833384Z 2025-12-04T12:33:36.6833737Z [W1204 12:32:15.391574323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6834151Z 2025-12-04T12:33:36.6834441Z [W1204 12:32:15.391701245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6834859Z 2025-12-04T12:33:36.6835171Z [W1204 12:32:15.391958200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6835544Z 2025-12-04T12:33:36.6835879Z [W1204 12:32:15.392081102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:33:36.6836249Z 2025-12-04T12:33:36.6836321Z FAILED [1.3491s] [100%] 2025-12-04T12:33:36.6836430Z 2025-12-04T12:33:36.6836521Z ==================================== RERUNS ==================================== 2025-12-04T12:33:36.6836864Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6837186Z Traceback (most recent call last): 2025-12-04T12:33:36.6837659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6838109Z method(*args, **kwargs) 2025-12-04T12:33:36.6838542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6838990Z method(*args, **kwargs) 2025-12-04T12:33:36.6839416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6839878Z with policy(): 2025-12-04T12:33:36.6840338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6840792Z raise RuntimeError(msg) 2025-12-04T12:33:36.6841564Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 230686720 and is now 268435456. 2025-12-04T12:33:36.6842311Z 2025-12-04T12:33:36.6842445Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6842972Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6843364Z 2025-12-04T12:33:36.6843536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6843912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6844206Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6844440Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6845181Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6845991Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6846318Z graph_break [] 2025-12-04T12:33:36.6846614Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6847011Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:33:36.6847921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:33:36.6848778Z if out == self.unknown_value: 2025-12-04T12:33:36.6849076Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6849392Z Traceback (most recent call last): 2025-12-04T12:33:36.6849933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6850383Z method(*args, **kwargs) 2025-12-04T12:33:36.6850807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6851279Z method(*args, **kwargs) 2025-12-04T12:33:36.6851725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6852182Z with policy(): 2025-12-04T12:33:36.6852581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6853044Z raise RuntimeError(msg) 2025-12-04T12:33:36.6853813Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6854532Z 2025-12-04T12:33:36.6854676Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6855198Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6855591Z 2025-12-04T12:33:36.6855756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6856126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6856405Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6856630Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6857359Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6858185Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6858519Z graph_break [] 2025-12-04T12:33:36.6858809Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6859216Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:33:36.6860127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:33:36.6860963Z if out == self.unknown_value: 2025-12-04T12:33:36.6861222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6861503Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6861737Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6862096Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6862910Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6863610Z graph_break [] 2025-12-04T12:33:36.6863898Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6864251Z =================================== FAILURES =================================== 2025-12-04T12:33:36.6864592Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:33:36.6864909Z Traceback (most recent call last): 2025-12-04T12:33:36.6865421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6865914Z method(*args, **kwargs) 2025-12-04T12:33:36.6866345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6866790Z method(*args, **kwargs) 2025-12-04T12:33:36.6867240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6867692Z with policy(): 2025-12-04T12:33:36.6868138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6868591Z raise RuntimeError(msg) 2025-12-04T12:33:36.6869351Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6870084Z 2025-12-04T12:33:36.6870216Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6870733Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6871115Z 2025-12-04T12:33:36.6871283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6871643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6871927Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6872158Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6872886Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6873685Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6874009Z graph_break [] 2025-12-04T12:33:36.6874307Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6874709Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:33:36.6875620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:33:36.6876457Z if out == self.unknown_value: 2025-12-04T12:33:36.6876712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6876989Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6877211Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6877567Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6878383Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6879067Z graph_break [] 2025-12-04T12:33:36.6879347Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6879747Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6880073Z frames [('total', 2), ('ok', 2)] 2025-12-04T12:33:36.6880305Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T12:33:36.6880671Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T12:33:36.6881531Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T12:33:36.6882264Z graph_break [] 2025-12-04T12:33:36.6882554Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T12:33:36.6883255Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml - 2025-12-04T12:33:36.6883874Z =========================== short test summary info ============================ 2025-12-04T12:33:36.6884995Z FAILED [1.3491s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456. 2025-12-04T12:33:36.6885989Z 2025-12-04T12:33:36.6886134Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6886644Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6887034Z 2025-12-04T12:33:36.6887196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6887544Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:33:36.6887856Z ================= 1 failed, 187 deselected, 2 rerun in 14.62s ================== 2025-12-04T12:33:36.6888125Z Got exit code 1 2025-12-04T12:33:36.6888484Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T12:33:36.6889059Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:33:36.6889661Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml 2025-12-04T12:33:36.6890123Z ============================= test session starts ============================== 2025-12-04T12:33:36.6890517Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:33:36.6890874Z cachedir: .pytest_cache 2025-12-04T12:33:36.6891290Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:33:36.6891750Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:33:36.6891980Z configfile: pytest.ini 2025-12-04T12:33:36.6892440Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:33:36.6893006Z collecting ... collected 188 items / 23 deselected / 165 selected 2025-12-04T12:33:36.6893304Z stepcurrent: skipping 23 already run items. 2025-12-04T12:33:36.6893532Z Running 165 items in this shard 2025-12-04T12:33:36.6893661Z 2025-12-04T12:33:36.6894027Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda PASSED [3.1339s] [ 0%] 2025-12-04T12:33:36.6894808Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda PASSED [1.8887s] [ 1%] 2025-12-04T12:33:36.6895585Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda PASSED [1.8864s] [ 1%] 2025-12-04T12:33:36.6896353Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda PASSED [1.8860s] [ 2%] 2025-12-04T12:33:36.6897095Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda PASSED [0.5796s] [ 3%] 2025-12-04T12:33:36.6897896Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.7819s] [ 3%] 2025-12-04T12:33:36.6898665Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.9509s] [ 4%] 2025-12-04T12:33:36.6899406Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.6066s] [ 4%] 2025-12-04T12:33:36.6900212Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.6533s] [ 5%] 2025-12-04T12:33:36.6900966Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.2766s] [ 6%] 2025-12-04T12:33:36.6901696Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.3926s] [ 6%] 2025-12-04T12:33:36.6902426Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.6147s] [ 7%] 2025-12-04T12:33:36.6903161Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.3301s] [ 7%] 2025-12-04T12:33:36.6903904Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [0.6516s] [ 8%] 2025-12-04T12:33:36.6904640Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda PASSED [0.4575s] [ 9%] 2025-12-04T12:33:36.6905361Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.3974s] [ 9%] 2025-12-04T12:33:36.6906087Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.6255s] [ 10%] 2025-12-04T12:33:36.6906808Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.3353s] [ 10%] 2025-12-04T12:33:36.6907538Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.6509s] [ 11%] 2025-12-04T12:33:36.6908267Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.2751s] [ 12%] 2025-12-04T12:33:36.6908980Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.3896s] [ 12%] 2025-12-04T12:33:36.6909695Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.6187s] [ 13%] 2025-12-04T12:33:36.6910416Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.3295s] [ 13%] 2025-12-04T12:33:36.6911140Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [0.6494s] [ 14%] 2025-12-04T12:33:36.6911844Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.6538s] [ 15%] 2025-12-04T12:33:36.6912536Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.6588s] [ 15%] 2025-12-04T12:33:36.6913222Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda PASSED [0.4600s] [ 16%] 2025-12-04T12:33:36.6913892Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.5813s] [ 16%] 2025-12-04T12:33:36.6914560Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.4348s] [ 17%] 2025-12-04T12:33:36.6915226Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.5822s] [ 18%] 2025-12-04T12:33:36.6915945Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda PASSED [0.4369s] [ 18%] 2025-12-04T12:33:36.6926587Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.8019s] [ 19%] 2025-12-04T12:33:36.6927516Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.4047s] [ 20%] 2025-12-04T12:33:36.6928288Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.5407s] [ 20%] 2025-12-04T12:33:36.6928970Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda PASSED [0.4070s] [ 21%] 2025-12-04T12:33:36.6929625Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.5416s] [ 21%] 2025-12-04T12:33:36.6930316Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 PASSED [0.2214s] [ 22%] 2025-12-04T12:33:36.6930992Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 PASSED [0.3850s] [ 23%] 2025-12-04T12:33:36.6931670Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 PASSED [0.2096s] [ 23%] 2025-12-04T12:33:36.6932331Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 PASSED [0.3803s] [ 24%] 2025-12-04T12:33:36.6932986Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 PASSED [0.2056s] [ 24%] 2025-12-04T12:33:36.6933634Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 PASSED [0.3968s] [ 25%] 2025-12-04T12:33:36.6934261Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda PASSED [0.1149s] [ 26%] 2025-12-04T12:33:36.6934840Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda PASSED [0.1153s] [ 26%] 2025-12-04T12:33:36.6935617Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0003s] (Need device-side TMA support in Triton) [ 27%] 2025-12-04T12:33:36.6936596Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 27%] 2025-12-04T12:33:36.6937555Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 28%] 2025-12-04T12:33:36.6938536Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 29%] 2025-12-04T12:33:36.6939496Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 29%] 2025-12-04T12:33:36.6940475Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 30%] 2025-12-04T12:33:36.6941441Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 30%] 2025-12-04T12:33:36.6942391Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 31%] 2025-12-04T12:33:36.6943197Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda SKIPPED [0.0002s] (Not supported on non B200) [ 32%] 2025-12-04T12:33:36.6943824Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda PASSED [5.0432s] [ 32%] 2025-12-04T12:33:36.6944803Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda W1204 12:32:55.525000 168690 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:33:36.6945658Z ('RERUN', {'yellow': True}) [0.5648s] [ 33%] 2025-12-04T12:33:36.6946300Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5077s] [ 33%] 2025-12-04T12:33:36.6947193Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.5015s] [ 33%] 2025-12-04T12:33:36.6947669Z 2025-12-04T12:33:36.6947771Z ==================================== RERUNS ==================================== 2025-12-04T12:33:36.6948236Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.6948695Z Traceback (most recent call last): 2025-12-04T12:33:36.6949223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6949683Z method(*args, **kwargs) 2025-12-04T12:33:36.6950134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6950590Z method(*args, **kwargs) 2025-12-04T12:33:36.6950998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6951431Z with policy(): 2025-12-04T12:33:36.6951825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6952265Z raise RuntimeError(msg) 2025-12-04T12:33:36.6953209Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632. 2025-12-04T12:33:36.6954116Z 2025-12-04T12:33:36.6954251Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6954913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T12:33:36.6955440Z 2025-12-04T12:33:36.6955608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6955979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6956262Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.6956482Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.6956837Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.6957223Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.6957474Z graph_break [] 2025-12-04T12:33:36.6957678Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T12:33:36.6958139Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.6958564Z Traceback (most recent call last): 2025-12-04T12:33:36.6959012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6959463Z method(*args, **kwargs) 2025-12-04T12:33:36.6959873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6960372Z method(*args, **kwargs) 2025-12-04T12:33:36.6960828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6961308Z with policy(): 2025-12-04T12:33:36.6961723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6962199Z raise RuntimeError(msg) 2025-12-04T12:33:36.6963166Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632. 2025-12-04T12:33:36.6964046Z 2025-12-04T12:33:36.6964184Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6964854Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T12:33:36.6965388Z 2025-12-04T12:33:36.6965559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6965945Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6966227Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.6966451Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.6966805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.6967179Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.6967419Z graph_break [] 2025-12-04T12:33:36.6967609Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T12:33:36.6967912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6968199Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.6968414Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.6968760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.6969127Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.6969363Z graph_break [] 2025-12-04T12:33:36.6969557Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T12:33:36.6969834Z =================================== FAILURES =================================== 2025-12-04T12:33:36.6970285Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.6970717Z Traceback (most recent call last): 2025-12-04T12:33:36.6971178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6971635Z method(*args, **kwargs) 2025-12-04T12:33:36.6972041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6972499Z method(*args, **kwargs) 2025-12-04T12:33:36.6972946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.6973391Z with policy(): 2025-12-04T12:33:36.6973787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.6974246Z raise RuntimeError(msg) 2025-12-04T12:33:36.6975196Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632. 2025-12-04T12:33:36.6976081Z 2025-12-04T12:33:36.6976218Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6976932Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T12:33:36.6977506Z 2025-12-04T12:33:36.6977672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6978080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6978363Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.6978625Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.6978989Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.6979370Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.6979624Z graph_break [] 2025-12-04T12:33:36.6979828Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T12:33:36.6980147Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6980424Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.6980638Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.6980985Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.6981353Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.6981602Z graph_break [] 2025-12-04T12:33:36.6981806Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T12:33:36.6982117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.6982388Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.6982601Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.6982956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.6983334Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.6983574Z graph_break [] 2025-12-04T12:33:36.6983769Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T12:33:36.6984341Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml - 2025-12-04T12:33:36.6984917Z =========================== short test summary info ============================ 2025-12-04T12:33:36.6986329Z FAILED [0.5015s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632. 2025-12-04T12:33:36.6987624Z 2025-12-04T12:33:36.6987755Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.6988411Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T12:33:36.6988958Z 2025-12-04T12:33:36.6989119Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.6989467Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:33:36.6989793Z ======= 1 failed, 45 passed, 9 skipped, 23 deselected, 2 rerun in 34.64s ======= 2025-12-04T12:33:36.6990077Z Got exit code 1 2025-12-04T12:33:36.6990242Z Retrying single test... 2025-12-04T12:33:36.6990639Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml 2025-12-04T12:33:36.6991093Z ============================= test session starts ============================== 2025-12-04T12:33:36.6991479Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:33:36.6991831Z cachedir: .pytest_cache 2025-12-04T12:33:36.6992306Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:33:36.6992804Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:33:36.6993021Z configfile: pytest.ini 2025-12-04T12:33:36.6993475Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:33:36.6994097Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:33:36.6994807Z stepcurrent: skipping 77 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T12:33:36.6995464Z Running 1 items in this shard 2025-12-04T12:33:36.6995592Z 2025-12-04T12:33:36.6996223Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda W1204 12:33:04.450000 170395 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:33:36.6997021Z ('RERUN', {'yellow': True}) [1.7100s] [100%] 2025-12-04T12:33:36.6997558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2665s] [100%] 2025-12-04T12:33:36.6998010Z 2025-12-04T12:33:36.6998102Z ==================================== RERUNS ==================================== 2025-12-04T12:33:36.6998547Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.6998977Z Traceback (most recent call last): 2025-12-04T12:33:36.6999430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.6999892Z method(*args, **kwargs) 2025-12-04T12:33:36.7000352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7000800Z method(*args, **kwargs) 2025-12-04T12:33:36.7001227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7001669Z with policy(): 2025-12-04T12:33:36.7002073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7002520Z raise RuntimeError(msg) 2025-12-04T12:33:36.7003417Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 230686720 and is now 281018368. 2025-12-04T12:33:36.7004280Z 2025-12-04T12:33:36.7004416Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7005076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T12:33:36.7005623Z 2025-12-04T12:33:36.7005784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7006160Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7006438Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7006658Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7006943Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7007315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7007632Z graph_break [] 2025-12-04T12:33:36.7007841Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T12:33:36.7008457Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml - 2025-12-04T12:33:36.7009072Z ================== 1 passed, 187 deselected, 1 rerun in 2.02s ================== 2025-12-04T12:33:36.7009327Z Got exit code 0 2025-12-04T12:33:36.7009571Z Test succeeded in new process, continuing with the rest of the tests 2025-12-04T12:33:36.7010101Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml 2025-12-04T12:33:36.7010582Z ============================= test session starts ============================== 2025-12-04T12:33:36.7010979Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:33:36.7011328Z cachedir: .pytest_cache 2025-12-04T12:33:36.7011742Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:33:36.7012190Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:33:36.7012400Z configfile: pytest.ini 2025-12-04T12:33:36.7012852Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:33:36.7013403Z collecting ... collected 188 items / 78 deselected / 110 selected 2025-12-04T12:33:36.7013701Z stepcurrent: skipping 78 already run items. 2025-12-04T12:33:36.7013933Z Running 110 items in this shard 2025-12-04T12:33:36.7014059Z 2025-12-04T12:33:36.7014707Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda W1204 12:33:13.688000 170576 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:33:36.7015506Z ('RERUN', {'yellow': True}) [1.6993s] [ 0%] 2025-12-04T12:33:36.7016054Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2742s] [ 0%] 2025-12-04T12:33:36.7016880Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2625s] [ 1%] 2025-12-04T12:33:36.7017896Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2683s] [ 2%] 2025-12-04T12:33:36.7018780Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2725s] [ 2%] 2025-12-04T12:33:36.7019587Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2803s] [ 3%] 2025-12-04T12:33:36.7020445Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2513s] [ 4%] 2025-12-04T12:33:36.7021311Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2483s] [ 4%] 2025-12-04T12:33:36.7022120Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2520s] [ 5%] 2025-12-04T12:33:36.7022922Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2490s] [ 6%] 2025-12-04T12:33:36.7023723Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2495s] [ 7%] 2025-12-04T12:33:36.7024523Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2489s] [ 8%] 2025-12-04T12:33:36.7025322Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2494s] [ 9%] 2025-12-04T12:33:36.7026255Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2485s] [ 10%] 2025-12-04T12:33:36.7027115Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2519s] [ 10%] 2025-12-04T12:33:36.7028076Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2496s] [ 10%] 2025-12-04T12:33:36.7028887Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2508s] [ 11%] 2025-12-04T12:33:36.7029693Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2491s] [ 12%] 2025-12-04T12:33:36.7030548Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2504s] [ 13%] 2025-12-04T12:33:36.7031405Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2510s] [ 13%] 2025-12-04T12:33:36.7032209Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2500s] [ 14%] 2025-12-04T12:33:36.7033060Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2503s] [ 15%] 2025-12-04T12:33:36.7033915Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2517s] [ 15%] 2025-12-04T12:33:36.7034713Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2529s] [ 16%] 2025-12-04T12:33:36.7035525Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2506s] [ 17%] 2025-12-04T12:33:36.7036317Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2515s] [ 18%] 2025-12-04T12:33:36.7037123Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2511s] [ 19%] 2025-12-04T12:33:36.7037923Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2518s] [ 20%] 2025-12-04T12:33:36.7038713Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2512s] [ 20%] 2025-12-04T12:33:36.7039499Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2530s] [ 21%] 2025-12-04T12:33:36.7040345Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2530s] [ 22%] 2025-12-04T12:33:36.7041146Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2514s] [ 23%] 2025-12-04T12:33:36.7041938Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2512s] [ 24%] 2025-12-04T12:33:36.7042726Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2523s] [ 25%] 2025-12-04T12:33:36.7043555Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2520s] [ 26%] 2025-12-04T12:33:36.7044491Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2560s] [ 27%] 2025-12-04T12:33:36.7045446Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2503s] [ 27%] 2025-12-04T12:33:36.7046423Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2506s] [ 28%] 2025-12-04T12:33:36.7047312Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2552s] [ 29%] 2025-12-04T12:33:36.7048189Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2542s] [ 30%] 2025-12-04T12:33:36.7049082Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2542s] [ 30%] 2025-12-04T12:33:36.7049941Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2505s] [ 31%] 2025-12-04T12:33:36.7050791Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2541s] [ 32%] 2025-12-04T12:33:36.7051634Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2530s] [ 33%] 2025-12-04T12:33:36.7052481Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2530s] [ 34%] 2025-12-04T12:33:36.7053343Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2510s] [ 35%] 2025-12-04T12:33:36.7054196Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2539s] [ 36%] 2025-12-04T12:33:36.7055044Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2543s] [ 37%] 2025-12-04T12:33:36.7055954Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 38%] 2025-12-04T12:33:36.7056932Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 39%] 2025-12-04T12:33:36.7057907Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 40%] 2025-12-04T12:33:36.7058850Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 40%] 2025-12-04T12:33:36.7059606Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda PASSED [0.6188s] [ 41%] 2025-12-04T12:33:36.7060305Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2562s] [ 42%] 2025-12-04T12:33:36.7061157Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2538s] [ 43%] 2025-12-04T12:33:36.7062049Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2534s] [ 44%] 2025-12-04T12:33:36.7062917Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2547s] [ 45%] 2025-12-04T12:33:36.7063778Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2539s] [ 46%] 2025-12-04T12:33:36.7064642Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2547s] [ 47%] 2025-12-04T12:33:36.7065480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2539s] [ 48%] 2025-12-04T12:33:36.7066314Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2540s] [ 49%] 2025-12-04T12:33:36.7067132Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2548s] [ 50%] 2025-12-04T12:33:36.7067951Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2549s] [ 50%] 2025-12-04T12:33:36.7068775Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2541s] [ 51%] 2025-12-04T12:33:36.7069587Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2574s] [ 52%] 2025-12-04T12:33:36.7070409Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2550s] [ 53%] 2025-12-04T12:33:36.7071242Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2557s] [ 54%] 2025-12-04T12:33:36.7072069Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2556s] [ 55%] 2025-12-04T12:33:36.7072888Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2559s] [ 56%] 2025-12-04T12:33:36.7073714Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2553s] [ 57%] 2025-12-04T12:33:36.7074545Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2559s] [ 58%] 2025-12-04T12:33:36.7075376Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2565s] [ 59%] 2025-12-04T12:33:36.7076204Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2570s] [ 60%] 2025-12-04T12:33:36.7077026Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2568s] [ 60%] 2025-12-04T12:33:36.7077841Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2561s] [ 61%] 2025-12-04T12:33:36.7078647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2565s] [ 62%] 2025-12-04T12:33:36.7079459Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2568s] [ 63%] 2025-12-04T12:33:36.7080364Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2573s] [ 64%] 2025-12-04T12:33:36.7081251Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2578s] [ 65%] 2025-12-04T12:33:36.7082102Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2571s] [ 66%] 2025-12-04T12:33:36.7082961Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2570s] [ 67%] 2025-12-04T12:33:36.7083773Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2571s] [ 68%] 2025-12-04T12:33:36.7084582Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2577s] [ 69%] 2025-12-04T12:33:36.7085649Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0032s] (XPU does not support use_fast_accum=True for now) [ 70%] 2025-12-04T12:33:36.7086927Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0030s] (XPU does not support use_fast_accum=True for now) [ 70%] 2025-12-04T12:33:36.7088201Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0027s] (XPU does not support use_fast_accum=True for now) [ 71%] 2025-12-04T12:33:36.7089469Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0026s] (XPU does not support use_fast_accum=True for now) [ 72%] 2025-12-04T12:33:36.7090718Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0027s] (XPU does not support use_fast_accum=True for now) [ 73%] 2025-12-04T12:33:36.7091954Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 74%] 2025-12-04T12:33:36.7093178Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 75%] 2025-12-04T12:33:36.7094408Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0028s] (XPU does not support use_fast_accum=True for now) [ 76%] 2025-12-04T12:33:36.7095640Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 77%] 2025-12-04T12:33:36.7096870Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 78%] 2025-12-04T12:33:36.7098100Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 79%] 2025-12-04T12:33:36.7099368Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0027s] (XPU does not support use_fast_accum=True for now) [ 80%] 2025-12-04T12:33:36.7100653Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 80%] 2025-12-04T12:33:36.7101988Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 81%] 2025-12-04T12:33:36.7103253Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (bias is not supported when output dtype is float32) [ 82%] 2025-12-04T12:33:36.7104524Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0023s] (bias is not supported when output dtype is float32) [ 83%] 2025-12-04T12:33:36.7105771Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 84%] 2025-12-04T12:33:36.7106997Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 85%] 2025-12-04T12:33:36.7108222Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0026s] (bias is not supported when output dtype is float32) [ 86%] 2025-12-04T12:33:36.7109448Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0023s] (bias is not supported when output dtype is float32) [ 87%] 2025-12-04T12:33:36.7110668Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 88%] 2025-12-04T12:33:36.7111907Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 89%] 2025-12-04T12:33:36.7113129Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0023s] (bias is not supported when output dtype is float32) [ 90%] 2025-12-04T12:33:36.7114348Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0026s] (bias is not supported when output dtype is float32) [ 90%] 2025-12-04T12:33:36.7115526Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 91%] 2025-12-04T12:33:36.7116628Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 92%] 2025-12-04T12:33:36.7117945Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 93%] 2025-12-04T12:33:36.7119071Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 94%] 2025-12-04T12:33:36.7120327Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32 SKIPPED [0.0004s] (Need device-side TMA support in Triton) [ 95%] 2025-12-04T12:33:36.7121412Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 96%] 2025-12-04T12:33:36.7122467Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 97%] 2025-12-04T12:33:36.7123502Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 98%] 2025-12-04T12:33:36.7124641Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] failed while attempting to run meta for aten._scaled_mm.default 2025-12-04T12:33:36.7125563Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] Traceback (most recent call last): 2025-12-04T12:33:36.7126433Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T12:33:36.7127272Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] r = func(*args, **kwargs) 2025-12-04T12:33:36.7128026Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T12:33:36.7128804Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return self._op(*args, **kwargs) 2025-12-04T12:33:36.7129651Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6528, in meta_scaled_mm 2025-12-04T12:33:36.7130487Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return _check_scaled_mm_sizes( 2025-12-04T12:33:36.7131342Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6384, in _check_scaled_mm_sizes 2025-12-04T12:33:36.7132153Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] torch._check( 2025-12-04T12:33:36.7132903Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1734, in _check 2025-12-04T12:33:36.7133804Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] _check_with(RuntimeError, cond, message) # pyrefly: ignore [bad-argument-type] 2025-12-04T12:33:36.7134718Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1716, in _check_with 2025-12-04T12:33:36.7135529Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] raise error_type(message_evaluated) 2025-12-04T12:33:36.7136311Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] RuntimeError: Expected self.size(1) to be divisible by 16, but got self.size(1)=15 2025-12-04T12:33:36.7136878Z PASSED [0.2020s] [ 99%] 2025-12-04T12:33:36.7137607Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] failed while attempting to run meta for aten._scaled_mm.default 2025-12-04T12:33:36.7138645Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] Traceback (most recent call last): 2025-12-04T12:33:36.7139496Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T12:33:36.7140329Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] r = func(*args, **kwargs) 2025-12-04T12:33:36.7141090Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T12:33:36.7141865Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return self._op(*args, **kwargs) 2025-12-04T12:33:36.7142700Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6528, in meta_scaled_mm 2025-12-04T12:33:36.7143527Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return _check_scaled_mm_sizes( 2025-12-04T12:33:36.7144386Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6498, in _check_scaled_mm_sizes 2025-12-04T12:33:36.7145209Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] torch._check( 2025-12-04T12:33:36.7145949Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1734, in _check 2025-12-04T12:33:36.7146843Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] _check_with(RuntimeError, cond, message) # pyrefly: ignore [bad-argument-type] 2025-12-04T12:33:36.7147759Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1716, in _check_with 2025-12-04T12:33:36.7148560Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] raise error_type(message_evaluated) 2025-12-04T12:33:36.7150098Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] RuntimeError: Invalid scaling configuration. For tensorwise scaling, both scales should be scalar. For rowwise scaling, scale_a should be (233, 1), scale_b should be (1, 128). For (BlockWise1x128, BlockWise128x128), scale_a should be (233, 1), scale_b should be (1, 1). For (BlockWise1x128, BlockWise1x128), scale_a should be (233, 1), scale_b should be (1, 128). Got scale_a.size()=(1, 128) and scale_b.size()=(233, 1) 2025-12-04T12:33:36.7151420Z PASSED [0.1924s] [100%] 2025-12-04T12:33:36.7151527Z 2025-12-04T12:33:36.7151620Z ==================================== RERUNS ==================================== 2025-12-04T12:33:36.7152073Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.7152508Z Traceback (most recent call last): 2025-12-04T12:33:36.7152960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7153402Z method(*args, **kwargs) 2025-12-04T12:33:36.7153861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7154349Z method(*args, **kwargs) 2025-12-04T12:33:36.7154759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7155233Z with policy(): 2025-12-04T12:33:36.7155630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7156109Z raise RuntimeError(msg) 2025-12-04T12:33:36.7157025Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 230686720 and is now 278921216. 2025-12-04T12:33:36.7157890Z 2025-12-04T12:33:36.7158022Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7158694Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T12:33:36.7159234Z 2025-12-04T12:33:36.7159399Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7159772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7160089Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7160330Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7160619Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7160987Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7161310Z graph_break [] 2025-12-04T12:33:36.7161531Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T12:33:36.7161999Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.7162427Z Traceback (most recent call last): 2025-12-04T12:33:36.7162870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7163324Z method(*args, **kwargs) 2025-12-04T12:33:36.7163737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7164174Z method(*args, **kwargs) 2025-12-04T12:33:36.7164592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7165027Z with policy(): 2025-12-04T12:33:36.7165426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7165869Z raise RuntimeError(msg) 2025-12-04T12:33:36.7166765Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368. 2025-12-04T12:33:36.7167615Z 2025-12-04T12:33:36.7167757Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7168222Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T12:33:36.7168227Z 2025-12-04T12:33:36.7168391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7168530Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7168600Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7168701Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7168974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7169092Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7169157Z graph_break [] 2025-12-04T12:33:36.7169301Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T12:33:36.7169577Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.7169691Z Traceback (most recent call last): 2025-12-04T12:33:36.7170000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7170081Z method(*args, **kwargs) 2025-12-04T12:33:36.7170380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7170445Z method(*args, **kwargs) 2025-12-04T12:33:36.7170742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7170812Z with policy(): 2025-12-04T12:33:36.7171110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7171189Z raise RuntimeError(msg) 2025-12-04T12:33:36.7171975Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368. 2025-12-04T12:33:36.7171980Z 2025-12-04T12:33:36.7172112Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7172573Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T12:33:36.7172578Z 2025-12-04T12:33:36.7172747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7172877Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7172948Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7173050Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7173239Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7173354Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7173422Z graph_break [] 2025-12-04T12:33:36.7173530Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T12:33:36.7173813Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.7173887Z Traceback (most recent call last): 2025-12-04T12:33:36.7174205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7174280Z method(*args, **kwargs) 2025-12-04T12:33:36.7174578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7174652Z method(*args, **kwargs) 2025-12-04T12:33:36.7174952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7175021Z with policy(): 2025-12-04T12:33:36.7175338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7175410Z raise RuntimeError(msg) 2025-12-04T12:33:36.7176266Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368. 2025-12-04T12:33:36.7176338Z 2025-12-04T12:33:36.7176471Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7176923Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T12:33:36.7176962Z 2025-12-04T12:33:36.7177164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7177295Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7177363Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7177464Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7177643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7177760Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7177824Z graph_break [] 2025-12-04T12:33:36.7177926Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T12:33:36.7178204Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.7178277Z Traceback (most recent call last): 2025-12-04T12:33:36.7178578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7178661Z method(*args, **kwargs) 2025-12-04T12:33:36.7178958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7179025Z method(*args, **kwargs) 2025-12-04T12:33:36.7179317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7179376Z with policy(): 2025-12-04T12:33:36.7179676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7179745Z raise RuntimeError(msg) 2025-12-04T12:33:36.7180523Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368. 2025-12-04T12:33:36.7180529Z 2025-12-04T12:33:36.7180655Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7181106Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T12:33:36.7181118Z 2025-12-04T12:33:36.7181278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7181406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7181491Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7181587Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7181768Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7181885Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7181945Z graph_break [] 2025-12-04T12:33:36.7182047Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T12:33:36.7182322Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.7182393Z Traceback (most recent call last): 2025-12-04T12:33:36.7182700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7182763Z method(*args, **kwargs) 2025-12-04T12:33:36.7183108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7183213Z method(*args, **kwargs) 2025-12-04T12:33:36.7183508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7183602Z with policy(): 2025-12-04T12:33:36.7183901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7184000Z raise RuntimeError(msg) 2025-12-04T12:33:36.7184780Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368. 2025-12-04T12:33:36.7184784Z 2025-12-04T12:33:36.7184907Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7185371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T12:33:36.7185375Z 2025-12-04T12:33:36.7185533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7185663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7185751Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7185847Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7186024Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7186149Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7186209Z graph_break [] 2025-12-04T12:33:36.7186313Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T12:33:36.7186625Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T12:33:36.7186700Z Traceback (most recent call last): 2025-12-04T12:33:36.7187004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7187070Z method(*args, **kwargs) 2025-12-04T12:33:36.7187373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:33:36.7187446Z method(*args, **kwargs) 2025-12-04T12:33:36.7187740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:33:36.7187805Z with policy(): 2025-12-04T12:33:36.7188102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:33:36.7188169Z raise RuntimeError(msg) 2025-12-04T12:33:36.7189004Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 299892736. 2025-12-04T12:33:36.7189014Z 2025-12-04T12:33:36.7189145Z To execute this test, run the following from the base repo dir: 2025-12-04T12:33:36.7189640Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T12:33:36.7189644Z 2025-12-04T12:33:36.7189800Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:33:36.7189931Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:33:36.7190000Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:33:36.7190136Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T12:33:36.7190371Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T12:33:36.7190481Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T12:33:36.7190541Z graph_break [] 2025-12-04T12:33:36.7190716Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T12:33:36.7191141Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml - 2025-12-04T12:33:36.7191284Z =========== 74 passed, 36 skipped, 78 deselected, 7 rerun in 22.59s ============ 2025-12-04T12:33:36.7191823Z The following tests failed and then succeeded when run in a new process['test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda'] 2025-12-04T12:33:36.7192149Z The following tests failed consistently: ['test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16'] 2025-12-04T12:33:36.7192155Z 2025-12-04T12:33:36.7192432Z FINISHED PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_041887d0b8d7fee8_.log) 2025-12-04T12:33:36.7192436Z 2025-12-04T12:33:36.7192617Z Finished inductor/test_fp8 1/1 ... [2025-12-04 12:33:36.640779][13256.569074556], took 2.35min 2025-12-04T12:33:36.7193048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml 2025-12-04T12:33:36.7844090Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml 2025-12-04T12:33:36.8111449Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml 2025-12-04T12:33:36.8409105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml 2025-12-04T12:33:36.8660677Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml 2025-12-04T12:33:36.8891342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml 2025-12-04T12:33:37.4412432Z Uploading logs for 57116084862 to S3 2025-12-04T12:33:37.6024080Z Uploading artifacts took 0.68 seconds 2025-12-04T12:33:37.6024449Z inductor/test_fp8 1/1 failed! 2025-12-04T12:33:37.6027334Z Running inductor/test_flex_flash 1/1 ... [2025-12-04 12:33:37.602523][13257.530819143] 2025-12-04T12:33:37.6027712Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:33:37.6031524Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_flash.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:37.602914] 2025-12-04T12:33:41.7745606Z 2025-12-04T12:33:41.7747190Z inductor/test_flex_flash 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_flash_1.1_143acf3e8eed598e_.log 2025-12-04T12:33:41.7770389Z Running 58 items in this shard: test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_kernel_called_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_kernel_called_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_mask_mod_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_mask_mod_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_capture_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_capture_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_basic_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_basic_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_block_mask_with_score_mod_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_block_mask_with_score_mod_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_kernel_called_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_kernel_called_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_dual_buffers_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_dual_buffers_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_view_buffer_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_view_buffer_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_score_mod_with_many_buffer_indexing_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_score_mod_with_many_buffer_indexing_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_127_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_127_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_255_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_255_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_383_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_383_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_511_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_511_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_alibi_learned_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_alibi_learned_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_head_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_head_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_block_mask_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_block_mask_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_doc_mask_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_doc_mask_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_dual_buffer_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_dual_buffer_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_head_scale_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_head_scale_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_mask_mod_buffer_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_mask_mod_buffer_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_pos_bias_table_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_pos_bias_table_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_and_mask_buffers_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_and_mask_buffers_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_causal_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_causal_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_rel_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_rel_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_times_two_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_times_two_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_view_buffer_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_view_buffer_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_impl_error_with_requires_grad_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_impl_error_with_requires_grad_cuda_float16 2025-12-04T12:33:41.7788491Z 2025-12-04T12:33:41.7788706Z Finished inductor/test_flex_flash 1/1 ... [2025-12-04 12:33:41.774326][13261.70261849], took 0.07min 2025-12-04T12:33:41.7998913Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_flash/inductor.test_flex_flash-594b02277dbafddb.xml 2025-12-04T12:33:41.8314958Z Running inductor/test_segmented_tree 1/1 ... [2025-12-04 12:33:41.831263][13261.759562402] 2025-12-04T12:33:41.8315445Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:33:41.8318226Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_segmented_tree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:41.831546] 2025-12-04T12:33:45.3021862Z 2025-12-04T12:33:45.3022814Z inductor/test_segmented_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_segmented_tree_1.1_84df512657bb7938_.log 2025-12-04T12:33:45.3027026Z Running 12 items in this shard: test/inductor/test_segmented_tree.py::TestSegmentedTree::test_basic_construction, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_boundary_conditions, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_empty_array, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_full_array_range, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_invalid_ranges, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_max_query_matches_naive, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_multiple_operations, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_out_of_bounds, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_overlapping_updates, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_range_update, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_sequential_updates_and_queries, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_single_element_ranges 2025-12-04T12:33:45.3030075Z 2025-12-04T12:33:45.3030314Z Finished inductor/test_segmented_tree 1/1 ... [2025-12-04 12:33:45.301915][13265.230210049], took 0.06min 2025-12-04T12:33:45.3283997Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_segmented_tree/inductor.test_segmented_tree-3edcd06141f9e439.xml 2025-12-04T12:33:45.3583988Z Running inductor/test_kernel_optimization 1/1 ... [2025-12-04 12:33:45.358169][13265.286467138] 2025-12-04T12:33:45.3584626Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:33:45.3587701Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_kernel_optimization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:45.358490] 2025-12-04T12:33:56.9926748Z 2025-12-04T12:33:56.9927686Z inductor/test_kernel_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_kernel_optimization_1.1_55e099cc3f4c3e00_.log 2025-12-04T12:33:56.9928836Z Running 1 items in this shard: test/inductor/test_kernel_optimization.py::TestKernelOptimization::test_einsum_to_pointwise 2025-12-04T12:33:56.9929320Z 2025-12-04T12:33:56.9929631Z Finished inductor/test_kernel_optimization 1/1 ... [2025-12-04 12:33:56.992404][13276.92069971], took 0.19min 2025-12-04T12:33:57.0180820Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_optimization/inductor.test_kernel_optimization-b711ca038cf9dded.xml 2025-12-04T12:33:57.1062450Z Running inductor/test_metrics 1/1 ... [2025-12-04 12:33:57.105973][13277.034271488] 2025-12-04T12:33:57.1062996Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:33:57.1065692Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_metrics.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:57.106299] 2025-12-04T12:34:06.2366071Z 2025-12-04T12:34:06.2367765Z inductor/test_metrics 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_metrics_1.1_8516a4d7ea2a79fb_.log 2025-12-04T12:34:06.2370006Z Running 6 items in this shard: test/inductor/test_metrics.py::TestMetrics::test_atomic_add, test/inductor/test_metrics.py::TestMetrics::test_count_args, test/inductor/test_metrics.py::TestMetrics::test_count_pattern, test/inductor/test_metrics.py::TestMetrics::test_kernel_args_num_gb, test/inductor/test_metrics.py::TestMetrics::test_parse_proper_kernel_fn_code, test/inductor/test_metrics.py::TestMetrics::test_parse_reduction_hint 2025-12-04T12:34:06.2371557Z 2025-12-04T12:34:06.2371851Z Finished inductor/test_metrics 1/1 ... [2025-12-04 12:34:06.236186][13286.164474624], took 0.15min 2025-12-04T12:34:06.2622687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_metrics/inductor.test_metrics-b803ef0c4a9491e7.xml 2025-12-04T12:34:06.3352535Z Running export/test_unflatten_training_ir 1/1 ... [2025-12-04 12:34:06.334984][13286.263282146] 2025-12-04T12:34:06.3353067Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:34:06.3355612Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten_training_ir.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:34:06.335294] 2025-12-04T12:34:19.0719529Z 2025-12-04T12:34:19.0721141Z export/test_unflatten_training_ir 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_training_ir_1.1_381cd1c16fe9e11b_.log 2025-12-04T12:34:19.0734199Z Running 29 items in this shard: test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_assert_tensor_metadata_stack_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_attr_as_submod_input_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_dedup_sym_size_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_double_nested_submodule_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_duplicate_placeholder_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_fx_trace_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_nested_leaf_non_strict_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_placeholder_and_get_attr_ordering_after_unflattened_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_simple_alias_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_buffer_mutation_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_constant_obj_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_constant_tensor_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_container_type_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_eager_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_empty_branch_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_nested_access_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_nested_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_none_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_param_list_dict_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_preserve_signature_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_preserve_with_unused_input_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_requires_grad_param_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_root_module_type_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_shared_submodule_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_skipped_call_module_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_submodule_ordering_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_with_inplace_compile_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_wrong_input_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflattened_module_nodes_has_meta_val_training_ir 2025-12-04T12:34:19.0744643Z 2025-12-04T12:34:19.0744900Z Finished export/test_unflatten_training_ir 1/1 ... [2025-12-04 12:34:19.071772][13299.000066202], took 0.21min 2025-12-04T12:34:19.0979105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_unflatten_training_ir/export.test_unflatten_training_ir-0b06d16f89271e02.xml 2025-12-04T12:34:19.1755263Z Running inductor/test_triton_kernels 1/1 ... [2025-12-04 12:34:19.175254][13299.103552035] 2025-12-04T12:34:19.1756023Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:34:19.1758421Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:34:19.175551] 2025-12-04T12:36:34.8784508Z 2025-12-04T12:36:34.8787460Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_cd88835deb58b6d4_.log 2025-12-04T12:36:34.8908368Z Running 366 items in this shard: test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_i64_input, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_layout_constraint_needs_fixed_stride_order, test/inductor/test_triton_kernels.py::KernelTests::test_no_nan_kernels, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_triton_attrs_dict_equal_1_None_format, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching_duplicate, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_constants, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dependancies, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_cpp_wrapper, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_normal, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_mm_kernels_do_not_change, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_unaffected, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_fallback, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float16, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float32, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float64, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_global_constexpr, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_higher_order_func, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inputs_buffer_reuse, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_matmul_tracking, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_not_mark_dirty, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_type, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_none_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_out_of_order, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_reinplace_inplaceable_pass, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_slice_and_view_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input_nonzero_offset, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_to_cpu, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_various_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_constexpr_function, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol_with_custom_name, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_kernel_param, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop2, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_new_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_old_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_argmax, test/inductor/test_triton_kernels.py::MutationTests::test_branch_with_multiple_yield_args, test/inductor/test_triton_kernels.py::MutationTests::test_cumsum, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_one_return, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg_2, test/inductor/test_triton_kernels.py::MutationTests::test_get_tma_stores, test/inductor/test_triton_kernels.py::MutationTests::test_labels, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_4_times_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_2d_autotuned, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_block_ptr, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_import, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_atomic_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel1, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_false, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_true, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_kernel_with_block_ptr_2d, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_mul2_inplace_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_nested_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel_call, test/inductor/test_triton_kernels.py::MutationTests::test_reduce_sum, test/inductor/test_triton_kernels.py::MutationTests::test_triton_kernel_inference_mode, test/inductor/test_triton_kernels.py::MutationTests::test_while_loop, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_no_pre_or_post_hook_user_defined, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_meta, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_mutable_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_triton_kernel, test/inductor/test_triton_kernels.py::CustomOpTests::test_subclass, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_dynamic_grid_no_recompile, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_inductor, test/inductor/test_triton_kernels.py::CustomOpTests::test_wrap_triton_disabled_in_triton_op 2025-12-04T12:36:34.9024952Z 2025-12-04T12:36:34.9025200Z Finished inductor/test_triton_kernels 1/1 ... [2025-12-04 12:36:34.878955][13434.807247143], took 2.26min 2025-12-04T12:36:34.9049780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-31757cb9ac5c1c41.xml 2025-12-04T12:36:35.0500053Z Running inductor/test_lookup_table 1/1 ... [2025-12-04 12:36:35.049751][13434.978050062] 2025-12-04T12:36:35.0500528Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:36:35.0503186Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_lookup_table.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:35.050050] 2025-12-04T12:36:40.3321926Z 2025-12-04T12:36:40.3322853Z inductor/test_lookup_table 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_lookup_table_1.1_5735430f25b64d36_.log 2025-12-04T12:36:40.3323536Z 2025-12-04T12:36:40.3323828Z Finished inductor/test_lookup_table 1/1 ... [2025-12-04 12:36:40.331958][13440.260255284], took 0.09min 2025-12-04T12:36:40.3578349Z Running inductor/test_cutedsl_template 1/1 ... [2025-12-04 12:36:40.357585][13440.285884993] 2025-12-04T12:36:40.3578851Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:36:40.3582464Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:40.357902] 2025-12-04T12:36:45.7814972Z 2025-12-04T12:36:45.7817563Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_aa2751f33400ad4f_.log 2025-12-04T12:36:45.7825077Z Running 13 items in this shard: test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cse_integration, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e_autotune, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_op_overrides, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_defines, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_get_output_hook, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_indented_buffer_usage, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_modification_subgraph, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_multiple_templates_unique_names, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_render_includes_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_aliasing, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_env_contains_hooks 2025-12-04T12:36:45.7828982Z 2025-12-04T12:36:45.7829235Z Finished inductor/test_cutedsl_template 1/1 ... [2025-12-04 12:36:45.781098][13445.709393071], took 0.09min 2025-12-04T12:36:45.8070139Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-0fae1aaa4a2003eb.xml 2025-12-04T12:36:45.9004352Z Running inductor/test_benchmark_fusion 1/1 ... [2025-12-04 12:36:45.900153][13445.828452745] 2025-12-04T12:36:45.9004853Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:36:45.9007504Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:45.900462] 2025-12-04T12:37:05.1504208Z 2025-12-04T12:37:05.1505998Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_e074574f7d298815_.log 2025-12-04T12:37:05.1513364Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu 2025-12-04T12:37:05.1518396Z 2025-12-04T12:37:05.1518736Z Finished inductor/test_benchmark_fusion 1/1 ... [2025-12-04 12:37:05.149956][13465.078248761], took 0.32min 2025-12-04T12:37:05.1763301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-bc092756daa3cb92.xml 2025-12-04T12:37:05.2611289Z Running export/test_serdes 1/1 ... [2025-12-04 12:37:05.260872][13465.189170985] 2025-12-04T12:37:05.2611725Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:37:05.2614385Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:37:05.261180] 2025-12-04T12:39:51.3497880Z 2025-12-04T12:39:51.3499196Z export/test_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serdes_1.1_e8663fe68509169e_.log 2025-12-04T12:39:51.3777908Z Running 880 items in this shard: test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_assume_static_by_default_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_maxsize_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_no_grad_param_inplace_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_maxsize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportTestExport::test__scaled_dot_product_flash_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_additional_inputs_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_annotate_on_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_args_type_checked_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_aten_lift_fresh_copy_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attr_assignment_extra_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_constrain_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_baddbmm_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_fake_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_buffer_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_torch_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_wrong_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ccode_python_mod_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cdist_forward_compute_mode_zero_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_check_specialized_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_checks_to_constrain_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cleanup_dynamic_markers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colin_unbacked_backed_vr_sub_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colon_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_compiling_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_access_identical_symint_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_constant_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_same_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_contains_unbacked_no_escape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_int_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_input_naming_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_no_user_inp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_dup_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_requires_grad_const_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_in_eager_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_constrain_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_various_cases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_conv_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_crop_like_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cse_for_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_preserve_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_tag_metadata_re_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_batch_norm_functional_predispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_after_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_before_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_default_decomposition_core_cia_ops_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_integer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_strict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_gpu_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_float_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_auto_and_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_divisibility_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_range_violations_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_ok_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_into_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_reduce_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_to_all_single_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_reduce_scatter_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dont_duck_size_for_auto_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_double_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_list_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_with_nan_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_fake_kernel_inference_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_infers_fake_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_lr_shift_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_bounds_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_dataclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_inferred_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_generic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_user_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_various_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_spec_with_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_sym_round_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ends_of_bounds_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_enum_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_does_not_reference_eager_fallback_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_when_passing_mutating_primitive_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_exception_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_expand_copy_export_handles_implicit_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_api_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_as_backend_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_lifted_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_scandim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_symbool_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_warns_constant_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_basic_pop_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_container_methods_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_op_lib_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_mutable_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cyclic_reference_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_dynamo_config_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_run_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_state_dict_hooks_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_default_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_keyword_only_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_pytree_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_pytree_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_postional_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_function_schema_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_graph_with_no_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_bug_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_static_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_leak_compile_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_linear_preserve_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_onnx_reported_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_mod_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_at_aot_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_but_not_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_rnn_variants_with_warning_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_scan_pytree_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_script_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_statically_known_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_then_compile_tensor_ctor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_autocast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_complex_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_set_grad_enabled_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_wrong_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_external_call_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_weights_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_filter_traceback_frames_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_flex_attention_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_from_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fqn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_from_node_metadata_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_full_on_scalar_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_function_holding_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hints_wrapper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hoo_inline_users_issue_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_post_autograd_op_preserved_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inductor_backend_inside_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_recursive_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_int_shape_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_intermediate_shape_comp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_exporting_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_nonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_isnonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_113041_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_157289_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_161902_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_istft_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_invalid_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwargs_reorder_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_norm_unbacked_normalized_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_sharing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lazy_module_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_linear_conv_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_malformed_fqn_from_source_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mask_nonzero_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_masked_select_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_math_pow_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mismatched_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mixed_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_dict_key_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_list_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_with_dict_container_inp_out_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_modules_access_for_deleted_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_more_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multinomial_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multiple_definitions_same_name_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_namedtuple_input_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_native_multi_attention_head_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_dynamic_shapes_spec_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_fake_tensor_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_constant_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_init_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_check_is_size_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_persistent_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_none_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonstrict_retrace_preserves_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_not_registered_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_operator_aten_tensor_mode_variant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_output_node_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pad_sequence_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_param_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_partial_patched_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_variadic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_update_preserving_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_grad_wrappers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_annotation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_requires_grad_placeholders_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_profiling_code_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_python_asserts_with_sym_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_nested_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_range_constraints_with_replacement_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_alias_dtype_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_bool_cast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_for_max_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_size_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_assert_max_upper_bound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_register_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_repeat_interleave_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replaced_unbacked_bindings_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_reshape_view_helper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retracable_ep_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retrace_pre_autograd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decomposition_supports_user_input_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prm_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_with_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sdpa_gqa_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sequential_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_example_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_as_side_effect_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_empty_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_setgrad_lifted_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_shared_submodule_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_export_for_training_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_unbacked_view_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_size_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_slice_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_solver_unsupported_sympy_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_specialize_derived_dim_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_split_const_gm_with_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_make_fx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_primitives_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_shape_attribute_assignment_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_tensors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_static_dim_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_context_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_regular_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_new_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_float_operators_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_or_sym_and_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_sqrt_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symbool_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symfloat_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_additional_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_shapes_collection_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_tensor_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tag_ac_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_attribute_zero_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_aten_to_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_with_wrapped_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tolist_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_check_eq_commutativity_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_fn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_trace_under_fake_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_train_eval_on_exported_preautograd_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tril_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_triu_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_3d_matmul_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_deferred_runtime_retrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_expand_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_infer_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_kth_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_linear_layer_norm_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_noncontig_lin_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_pad_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_scalar_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_passthrough_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_unsqueeze_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_isinstance_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_no_unroll_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_buf_8_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_aliases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_use_embedding_twice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_user_input_and_buffer_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_custom_autograd_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_to_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_where_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_assert_separation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_index_assertions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_tensor_constant_idx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_wrapper_module_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test__scaled_dot_product_flash_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_additional_inputs_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_annotate_on_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_args_type_checked_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_aten_lift_fresh_copy_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attr_assignment_extra_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_constrain_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_baddbmm_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_fake_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_buffer_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_wrong_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ccode_python_mod_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cdist_forward_compute_mode_zero_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_check_specialized_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_checks_to_constrain_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cleanup_dynamic_markers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colon_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_compiling_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_access_identical_symint_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_constant_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_same_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_int_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_input_naming_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_no_user_inp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_dup_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_requires_grad_const_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_in_eager_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_constrain_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_various_cases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_conv_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_crop_like_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cse_for_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_preserve_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_tag_metadata_re_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_default_decomposition_core_cia_ops_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_integer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_strict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_gpu_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_float_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_auto_and_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_divisibility_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_range_violations_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_ok_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_into_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_reduce_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_to_all_single_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_double_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_list_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_infers_fake_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_lr_shift_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_bounds_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_dataclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_various_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_sym_round_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ends_of_bounds_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_enum_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_exception_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_api_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_as_backend_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_symbool_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_warns_constant_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_op_lib_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cyclic_reference_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_dynamo_config_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_run_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_default_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_keyword_only_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_pytree_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_postional_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_function_schema_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_graph_with_no_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_bug_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_static_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_leak_compile_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_onnx_reported_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_mod_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_rnn_variants_with_warning_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_scan_pytree_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_script_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_statically_known_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_then_compile_tensor_ctor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_autocast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_complex_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_set_grad_enabled_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_wrong_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_external_call_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_weights_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_filter_traceback_frames_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_flex_attention_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_from_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fqn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_from_node_metadata_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_full_on_scalar_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_function_holding_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hints_wrapper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hoo_inline_users_issue_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_post_autograd_op_preserved_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inductor_backend_inside_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_recursive_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_int_shape_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_intermediate_shape_comp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_exporting_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_nonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_isnonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_113041_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_157289_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_161902_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_istft_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_invalid_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwargs_reorder_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_sharing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lazy_module_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_linear_conv_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_malformed_fqn_from_source_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mask_nonzero_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_masked_select_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_math_pow_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mismatched_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mixed_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_dict_key_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_list_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_with_dict_container_inp_out_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_modules_access_for_deleted_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_more_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multinomial_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multiple_definitions_same_name_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_namedtuple_input_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_native_multi_attention_head_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_dynamic_shapes_spec_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_fake_tensor_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_constant_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_init_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_check_is_size_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_persistent_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_none_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_not_registered_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_output_node_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pad_sequence_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_param_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_partial_patched_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_variadic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_update_preserving_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_grad_wrappers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_annotation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_requires_grad_placeholders_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_profiling_code_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_python_asserts_with_sym_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_nested_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_range_constraints_with_replacement_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_bool_cast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_for_max_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_size_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_assert_max_upper_bound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_register_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_repeat_interleave_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replaced_unbacked_bindings_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_reshape_view_helper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retracable_ep_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retrace_pre_autograd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prm_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_with_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sdpa_gqa_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sequential_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_example_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_as_side_effect_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_empty_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_setgrad_lifted_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_shared_submodule_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_export_for_training_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_unbacked_view_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_size_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_slice_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_solver_unsupported_sympy_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_specialize_derived_dim_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_make_fx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_primitives_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_shape_attribute_assignment_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_tensors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_static_dim_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_context_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_new_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_float_operators_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_or_sym_and_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_sqrt_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symbool_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symfloat_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_additional_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_shapes_collection_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_tensor_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tag_ac_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_attribute_zero_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_aten_to_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tolist_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_check_eq_commutativity_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_fn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_trace_under_fake_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tril_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_triu_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_3d_matmul_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_expand_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_infer_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_kth_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_noncontig_lin_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_pad_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_scalar_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_passthrough_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_unsqueeze_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_isinstance_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_no_unroll_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_buf_8_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_aliases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_use_embedding_twice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_user_input_and_buffer_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_custom_autograd_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_to_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_where_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_assert_separation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_index_assertions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_tensor_constant_idx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_wrapper_module_serdes_nonstrict 2025-12-04T12:39:51.4044117Z 2025-12-04T12:39:51.4044446Z Finished export/test_serdes 1/1 ... [2025-12-04 12:39:51.351451][13631.27974505], took 2.77min 2025-12-04T12:39:51.4045239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serdes/export.test_serdes-707c3510a208c1b4.xml 2025-12-04T12:39:51.4860780Z Running inductor/test_control_deps 1/1 ... [2025-12-04 12:39:51.485816][13631.414113475] 2025-12-04T12:39:51.4861524Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:39:51.4864089Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_deps.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:39:51.486104] 2025-12-04T12:40:00.5667207Z 2025-12-04T12:40:00.5668696Z inductor/test_control_deps 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_deps_1.1_8b7ff51ad1c7850a_.log 2025-12-04T12:40:00.5671085Z Running 1 items in this shard: test/inductor/test_control_deps.py::TestControlDeps::test_control_deps_prevents_fusion 2025-12-04T12:40:00.5672124Z 2025-12-04T12:40:00.5672758Z Finished inductor/test_control_deps 1/1 ... [2025-12-04 12:40:00.566303][13640.494597899], took 0.15min 2025-12-04T12:40:00.5936950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-c5f7586c63bf6a73.xml 2025-12-04T12:40:00.6625995Z Running inductor/test_benchmarking 1/1 ... [2025-12-04 12:40:00.662335][13640.590634834] 2025-12-04T12:40:00.6626481Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:40:00.6629033Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmarking.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:00.662641] 2025-12-04T12:40:07.2383295Z 2025-12-04T12:40:07.2384192Z inductor/test_benchmarking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmarking_1.1_829d05e5f8b311c4_.log 2025-12-04T12:40:07.2389454Z Running 12 items in this shard: test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cuda, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cuda 2025-12-04T12:40:07.2393689Z 2025-12-04T12:40:07.2393945Z Finished inductor/test_benchmarking 1/1 ... [2025-12-04 12:40:07.238034][13647.166325401], took 0.11min 2025-12-04T12:40:07.2648462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-a0634e57d5b5356a.xml 2025-12-04T12:40:07.3420559Z Running inductor/test_helion_kernels 1/1 ... [2025-12-04 12:40:07.341821][13647.270119503] 2025-12-04T12:40:07.3421033Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:40:07.3424177Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_helion_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:07.342137] 2025-12-04T12:40:12.7156760Z 2025-12-04T12:40:12.7157977Z inductor/test_helion_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_helion_kernels_1.1_570e1b539a64331c_.log 2025-12-04T12:40:12.7159280Z Running 2 items in this shard: test/inductor/test_helion_kernels.py::HelionTests::test_add_kernel, test/inductor/test_helion_kernels.py::HelionTests::test_softmax_view_reshape 2025-12-04T12:40:12.7159933Z 2025-12-04T12:40:12.7160330Z Finished inductor/test_helion_kernels 1/1 ... [2025-12-04 12:40:12.715429][13652.64372413], took 0.09min 2025-12-04T12:40:12.7429698Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-31b2988c053252a6.xml 2025-12-04T12:40:12.7719248Z Running inductor/test_quantization 1/1 ... [2025-12-04 12:40:12.771679][13652.699978231] 2025-12-04T12:40:12.7719732Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:40:12.7723119Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_quantization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:12.772006] 2025-12-04T12:40:24.7069868Z 2025-12-04T12:40:24.7070935Z inductor/test_quantization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_quantization_1.1_121915a11d71492f_.log 2025-12-04T12:40:24.7072475Z Running 2 items in this shard: test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_with_scaling, test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_without_scaling 2025-12-04T12:40:24.7073410Z 2025-12-04T12:40:24.7073702Z Finished inductor/test_quantization 1/1 ... [2025-12-04 12:40:24.706710][13664.635006436], took 0.20min 2025-12-04T12:40:24.7335903Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-84688fa1768a881f.xml 2025-12-04T12:40:24.8085738Z Running inductor/test_best_config 1/1 ... [2025-12-04 12:40:24.808331][13664.736629886] 2025-12-04T12:40:24.8086237Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:40:24.8089108Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_best_config.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:24.808647] 2025-12-04T12:40:32.0853792Z 2025-12-04T12:40:32.0854884Z inductor/test_best_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_best_config_1.1_4e63c9241f189413_.log 2025-12-04T12:40:32.0855974Z Running 1 items in this shard: test/inductor/test_best_config.py::TestKernelBestConfig::test_best_config_has_triton_cache_key 2025-12-04T12:40:32.0856462Z 2025-12-04T12:40:32.0856753Z Finished inductor/test_best_config 1/1 ... [2025-12-04 12:40:32.084945][13672.013234901], took 0.12min 2025-12-04T12:40:32.1123736Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_best_config/inductor.test_best_config-423c961bf0014ead.xml 2025-12-04T12:40:32.1853969Z Running export/test_tools 1/1 ... [2025-12-04 12:40:32.185159][13672.113457681] 2025-12-04T12:40:32.1854403Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:40:32.1857497Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tools.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:32.185466] 2025-12-04T12:40:36.0061262Z 2025-12-04T12:40:36.0062088Z export/test_tools 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tools_1.1_3c90f78b66d684d8_.log 2025-12-04T12:40:36.0063288Z Running 2 items in this shard: test/export/test_tools.py::TestExportTools::test_report_exportability_basic, test/export/test_tools.py::TestExportTools::test_report_exportability_with_issues 2025-12-04T12:40:36.0064292Z 2025-12-04T12:40:36.0064665Z Finished export/test_tools 1/1 ... [2025-12-04 12:40:36.005906][13675.934201134], took 0.06min 2025-12-04T12:40:36.0332040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tools/export.test_tools-21258e52645f11ac.xml 2025-12-04T12:40:36.0626181Z Running inductor/test_compiled_optimizers 1/3 ... [2025-12-04 12:40:36.062393][13675.990691408] 2025-12-04T12:40:36.0626664Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:40:36.0629422Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_optimizers.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:36.062683] 2025-12-04T12:47:42.7006572Z 2025-12-04T12:47:42.7007883Z inductor/test_compiled_optimizers 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_optimizers_1.3_97f7ba3c63654c1d_.log 2025-12-04T12:47:42.7099179Z Running 248 items in this shard: test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_initial_accumulator_value_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_lr_decay_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_t0_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_closure_graph_break, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_foreach_map_adam, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_momentum_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_step_sizes_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_recompile_single, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_ASGD_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adagrad_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adamax_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_LBFGS_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Muon_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_NAdam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RAdam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RMSprop_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Rprop_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SparseAdam_use_closure_False_cuda_float32 2025-12-04T12:47:42.7187843Z 2025-12-04T12:47:42.7188117Z Finished inductor/test_compiled_optimizers 1/3 ... [2025-12-04 12:47:42.700896][14102.629190424], took 7.11min 2025-12-04T12:47:42.7287132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-bb39f909a1ebabd7.xml 2025-12-04T12:47:42.8070619Z Running inductor/test_aot_inductor_custom_ops 1/1 ... [2025-12-04 12:47:42.806799][14102.735096986] 2025-12-04T12:47:42.8071455Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:47:42.8073901Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_custom_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:47:42.807120] 2025-12-04T12:50:06.6132253Z 2025-12-04T12:50:06.6133895Z inductor/test_aot_inductor_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_custom_ops_1.1_dddd6a5d20b7cc0a_.log 2025-12-04T12:50:06.6148461Z Running 35 items in this shard: test/inductor/test_aot_inductor_custom_ops.py::AOTInductorLoggingTest::test_shape_env_reuse, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_boxed_run_inputs_clearing_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_add_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_add_output_path_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_all_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_missing_arg_with_default_value_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_out_variant_without_return_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_return_list_of_single_tensor_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_return_single_tensor_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_square_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_concat_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_multiple_outputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_reinterpret_view_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_int_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_nullopt_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_output_2_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_incorrect_custom_op_schema_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_boxed_run_inputs_clearing_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_add_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_add_output_path_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_all_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_missing_arg_with_default_value_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_out_variant_without_return_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_return_list_of_single_tensor_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_return_single_tensor_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_square_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_with_concat_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_with_multiple_outputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_with_reinterpret_view_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_int_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_optional_tensor_nullopt_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_optional_tensor_output_2_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_optional_tensor_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_incorrect_custom_op_schema_cuda 2025-12-04T12:50:06.6160704Z 2025-12-04T12:50:06.6160971Z Finished inductor/test_aot_inductor_custom_ops 1/1 ... [2025-12-04 12:50:06.613176][14246.541472903], took 2.40min 2025-12-04T12:50:06.6408044Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_custom_ops/inductor.test_aot_inductor_custom_ops-a7a529277f9f9a31.xml 2025-12-04T12:50:06.7211840Z Running inductor/test_control_flow 4/5 ... [2025-12-04 12:50:06.720932][14246.64923176] 2025-12-04T12:50:06.7212301Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:50:06.7215090Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=4', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:50:06.721231] 2025-12-04T12:57:58.2014294Z 2025-12-04T12:57:58.2059952Z inductor/test_control_flow 4/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_4.5_21e8d4f49de459e9_.log 2025-12-04T12:57:58.2108133Z Running 129 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_infinite_loop_error, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_models_with_mixed_device_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_mismatch_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_cpu, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_True_autograd_True 2025-12-04T12:57:58.2205692Z 2025-12-04T12:57:58.2207435Z Finished inductor/test_control_flow 4/5 ... [2025-12-04 12:57:58.220483][14718.14877457], took 7.86min 2025-12-04T12:57:58.2477207Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-955a1f614439a238.xml 2025-12-04T12:57:59.2872948Z Uploading artifacts took 0.95 seconds 2025-12-04T12:57:59.2876953Z Running dynamo/test_cudagraphs 1/1 ... [2025-12-04 12:57:59.287483][14719.215779635] 2025-12-04T12:57:59.2877665Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:57:59.2881198Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:57:59.287857] 2025-12-04T12:58:04.4617621Z 2025-12-04T12:58:04.4618638Z dynamo/test_cudagraphs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_1.1_90173d668b3e025f_.log 2025-12-04T12:58:04.4621531Z Running 8 items in this shard: test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_basic, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dead_fill, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dtoh, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_factory, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_htod, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_constant, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_input, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutated_metadata 2025-12-04T12:58:04.4623764Z 2025-12-04T12:58:04.4624059Z Finished dynamo/test_cudagraphs 1/1 ... [2025-12-04 12:58:04.461496][14724.38979316], took 0.09min 2025-12-04T12:58:04.4890310Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-e1e65156309c3950.xml 2025-12-04T12:58:04.5234012Z Running inductor/test_alignment 1/1 ... [2025-12-04 12:58:04.523166][14724.451465504] 2025-12-04T12:58:04.5234639Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:58:04.5237287Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_alignment.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:04.523475] 2025-12-04T12:58:17.2109759Z 2025-12-04T12:58:17.2111052Z inductor/test_alignment 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_alignment_1.1_a164af8fc87f74b5_.log 2025-12-04T12:58:17.2390145Z Running 12 items in this shard: test/inductor/test_alignment.py::GPUTests::test_Q4_K_dequantization_cuda, test/inductor/test_alignment.py::GPUTests::test_alignment_without_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_incorrect_meta_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1024_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1048576_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_128_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_cuda, test/inductor/test_alignment.py::GPUTests::test_view_dtype_slice_cuda 2025-12-04T12:58:17.2394741Z 2025-12-04T12:58:17.2395139Z Finished inductor/test_alignment 1/1 ... [2025-12-04 12:58:17.210639][14737.138930421], took 0.21min 2025-12-04T12:58:17.2396414Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e4d33245d4b5500b.xml 2025-12-04T12:58:17.3236417Z Running dynamo/test_guard_serialization 1/1 ... [2025-12-04 12:58:17.323362][14737.251660909] 2025-12-04T12:58:17.3237094Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:58:17.3239852Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_serialization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:17.323698] 2025-12-04T12:58:31.3646902Z 2025-12-04T12:58:31.3648106Z dynamo/test_guard_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_serialization_1.1_1b7b87cb0989e9eb_.log 2025-12-04T12:58:31.3664243Z Running 56 items in this shard: test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bool_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_patched_forward, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_empty, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_builtin_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_c10d_work, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_class_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_var_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_constant_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_ddp_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_default_device, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_deterministic_algorithms, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_contains, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_version, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dispatch_key_set_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dual_level, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_duplicate_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_empty_nn_module_hooks_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_equals_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_fsdp_training_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_locals, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_with_wrong_fqn, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_functorch_stack_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_global_state_guard_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode_loading, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_guard_on_key_order_with_cache, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_hasattr_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match_with_config, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_mapping_keys_check, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_nn_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_none_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_not_present_in_generic_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_range_iterator_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sdp_backend_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sequence_length, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_shape_env, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_skipped_objects, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_subclass_metadata_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tuple_iterator_len, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_type_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_sharded_tensor, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_submodule, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_process_group, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_stream, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_weakref, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_weakref_alive, test/dynamo/test_guard_serialization.py::TestGuardSerializationFSDP::test_guard_serialization_fsdp_module 2025-12-04T12:58:31.3679078Z 2025-12-04T12:58:31.3679327Z Finished dynamo/test_guard_serialization 1/1 ... [2025-12-04 12:58:31.364396][14751.292682352], took 0.23min 2025-12-04T12:58:31.3934422Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-212782fb09fba1b6.xml 2025-12-04T12:58:31.4686186Z Running inductor/test_needs_exact_strides 1/1 ... [2025-12-04 12:58:31.468377][14751.39667498] 2025-12-04T12:58:31.4686686Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:58:31.4689567Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_needs_exact_strides.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:31.468687] 2025-12-04T12:58:40.7491328Z 2025-12-04T12:58:40.7492951Z inductor/test_needs_exact_strides 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_needs_exact_strides_1.1_5109866c57595440_.log 2025-12-04T12:58:40.7494799Z Running 2 items in this shard: test/inductor/test_needs_exact_strides.py::TestNeedsExactStrides::test_custom_op_float32, test/inductor/test_needs_exact_strides.py::TestNeedsExactStrides::test_custom_op_float8_e8m0fnu 2025-12-04T12:58:40.7495444Z 2025-12-04T12:58:40.7495704Z Finished inductor/test_needs_exact_strides 1/1 ... [2025-12-04 12:58:40.748774][14760.677066773], took 0.15min 2025-12-04T12:58:40.7773727Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_needs_exact_strides/inductor.test_needs_exact_strides-9df9f4b52deeb26d.xml 2025-12-04T12:58:40.8651353Z Running inductor/test_auto_functionalize 1/1 ... [2025-12-04 12:58:40.864883][14760.793181452] 2025-12-04T12:58:40.8651874Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:58:40.8654926Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_auto_functionalize.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:40.865212] 2025-12-04T12:59:03.9199293Z 2025-12-04T12:59:03.9202313Z inductor/test_auto_functionalize 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_auto_functionalize_1.1_68ce77f985508c50_.log 2025-12-04T12:59:03.9225587Z Running 39 items in this shard: test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_input_to_custom_op, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_output, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_default, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_none_return, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra1, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra3, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra4, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra5, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_on_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_self_as_mutate_arg, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_tensorlist, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_can_auto_functionalize, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_graph_input_is_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode1_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode4_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_recompile, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_scheduling_with_multiple_mutates, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_try_use_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_unbacked_auto_functionalize_op 2025-12-04T12:59:03.9245908Z 2025-12-04T12:59:03.9246365Z Finished inductor/test_auto_functionalize 1/1 ... [2025-12-04 12:59:03.919598][14783.847889142], took 0.38min 2025-12-04T12:59:03.9507058Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_auto_functionalize/inductor.test_auto_functionalize-2721b330ad87dcbb.xml 2025-12-04T12:59:04.0322600Z Running dynamo/test_modes 1/1 ... [2025-12-04 12:59:04.032007][14783.960305363] 2025-12-04T12:59:04.0323093Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:59:04.0325714Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_modes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:04.032311] 2025-12-04T12:59:23.4307116Z 2025-12-04T12:59:23.4309136Z dynamo/test_modes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_modes_1.1_8c4add47a33e36b7_.log 2025-12-04T12:59:23.4319429Z Running 29 items in this shard: test/dynamo/test_modes.py::TorchDispatchModeTests::test_skip_torch_dispatch_modes, test/dynamo/test_modes.py::TorchDispatchModeTests::test_torch_dispatch_ignore_compile_internals, test/dynamo/test_modes.py::TorchFunctionModeTests::test_builtin_equivalent_funcs, test/dynamo/test_modes.py::TorchFunctionModeTests::test_error_empty_stack_pop_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_expand, test/dynamo/test_modes.py::TorchFunctionModeTests::test_flex_attention, test/dynamo/test_modes.py::TorchFunctionModeTests::test_hop, test/dynamo/test_modes.py::TorchFunctionModeTests::test_hop_eager, test/dynamo/test_modes.py::TorchFunctionModeTests::test_intermedate_torch_function_mode_construction_mutation, test/dynamo/test_modes.py::TorchFunctionModeTests::test_is_torch_function_all_disabled, test/dynamo/test_modes.py::TorchFunctionModeTests::test_len_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_nested_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_pop_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_push_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_register_hook, test/dynamo/test_modes.py::TorchFunctionModeTests::test_stack_state_clear_default_device, test/dynamo/test_modes.py::TorchFunctionModeTests::test_stack_state_mutation_default_device, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_and_pop_graph_break, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_and_pop_graph_break_mutation, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_disable, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_enabled_guard, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_enter_exit, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_graph_break, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_guards_cpp, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_guards_py, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_highest_priority, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_preserves_cuda_rng_state, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_restore_on_exc, test/dynamo/test_modes.py::TorchFunctionModeLifecycleTests::test_default_device_restored_after_mode_tests 2025-12-04T12:59:23.4327141Z 2025-12-04T12:59:23.4327364Z Finished dynamo/test_modes 1/1 ... [2025-12-04 12:59:23.430255][14803.358547017], took 0.32min 2025-12-04T12:59:23.4596289Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_modes/dynamo.test_modes-84ffae5b03f7e325.xml 2025-12-04T12:59:23.5485698Z Running inductor/test_custom_partitioner_fn 1/1 ... [2025-12-04 12:59:23.548322][14803.476619996] 2025-12-04T12:59:23.5486209Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:59:23.5488899Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_custom_partitioner_fn.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:23.548628] 2025-12-04T12:59:32.6288788Z 2025-12-04T12:59:32.6289791Z inductor/test_custom_partitioner_fn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_custom_partitioner_fn_1.1_c3b6cd6a5d7fdcc1_.log 2025-12-04T12:59:32.6290980Z Running 1 items in this shard: test/inductor/test_custom_partitioner_fn.py::TestCustomPartitionerFn::test_custom_partitioner_fn 2025-12-04T12:59:32.6291485Z 2025-12-04T12:59:32.6291797Z Finished inductor/test_custom_partitioner_fn 1/1 ... [2025-12-04 12:59:32.628529][14812.556821916], took 0.15min 2025-12-04T12:59:32.6568247Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_custom_partitioner_fn/inductor.test_custom_partitioner_fn-7cdf0df133f39710.xml 2025-12-04T12:59:32.7419222Z Running dynamo/test_debug_utils 1/1 ... [2025-12-04 12:59:32.741656][14812.669954453] 2025-12-04T12:59:32.7419677Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:59:32.7422563Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_debug_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:32.741981] 2025-12-04T12:59:36.5632163Z 2025-12-04T12:59:36.5633015Z dynamo/test_debug_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_debug_utils_1.1_95c2c78decfe726b_.log 2025-12-04T12:59:36.5635332Z Running 4 items in this shard: test/dynamo/test_debug_utils.py::TestDebugUtilsCUDA::test_cast_model_to_fp64_dtype_args_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsCUDA::test_generate_env_vars_string_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsDeviceCUDA::test_aot_graph_parser_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsDeviceCUDA::test_sym_aot_graph_parser_cuda 2025-12-04T12:59:36.5636734Z 2025-12-04T12:59:36.5637014Z Finished dynamo/test_debug_utils 1/1 ... [2025-12-04 12:59:36.562963][14816.491259773], took 0.06min 2025-12-04T12:59:36.5911018Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_debug_utils/dynamo.test_debug_utils-34b85abff78e1075.xml 2025-12-04T12:59:36.6399660Z Running dynamo/test_base_hop 1/1 ... [2025-12-04 12:59:36.639728][14816.568026338] 2025-12-04T12:59:36.6400217Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:59:36.6402995Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_base_hop.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:36.640041] 2025-12-04T12:59:41.5128037Z 2025-12-04T12:59:41.5129349Z dynamo/test_base_hop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_base_hop_1.1_b2087f66aa3d673c_.log 2025-12-04T12:59:41.5132789Z Running 11 items in this shard: test/dynamo/test_base_hop.py::BaseHOPTest::test_aliasing_mutation_error, test/dynamo/test_base_hop.py::BaseHOPTest::test_aot_eager, test/dynamo/test_base_hop.py::BaseHOPTest::test_auto_functionalize, test/dynamo/test_base_hop.py::BaseHOPTest::test_dynamo, test/dynamo/test_base_hop.py::BaseHOPTest::test_eager_call, test/dynamo/test_base_hop.py::BaseHOPTest::test_int_input, test/dynamo/test_base_hop.py::BaseHOPTest::test_none_input, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_pytree_in_out, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_pytree_in_out_with_mutation, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_single_return, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_single_return_with_mutation 2025-12-04T12:59:41.5134912Z 2025-12-04T12:59:41.5135118Z Finished dynamo/test_base_hop 1/1 ... [2025-12-04 12:59:41.512373][14821.440668173], took 0.08min 2025-12-04T12:59:41.5408423Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_base_hop/dynamo.test_base_hop-f16fc4276458e68a.xml 2025-12-04T12:59:41.5726309Z Running dynamo/test_export 1/1 ... [2025-12-04 12:59:41.572385][14821.500685041] 2025-12-04T12:59:41.5726752Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:59:41.5729394Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_export.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:41.572684] 2025-12-04T13:00:07.0287594Z 2025-12-04T13:00:07.0288880Z dynamo/test_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_export_1.1_b7c3be727fa89598_.log 2025-12-04T13:00:07.0332519Z Running 186 items in this shard: test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_attr, test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_builtin, test/dynamo/test_export.py::ExportTests::test_byte_tensor_does_not_crash, test/dynamo/test_export.py::ExportTests::test_capture_symbolic_tracing_simple_within_fake_mode, test/dynamo/test_export.py::ExportTests::test_capture_symbolic_tracing_within_fake_mode, test/dynamo/test_export.py::ExportTests::test_cond_free_variables_overlapping, test/dynamo/test_export.py::ExportTests::test_cond_op_param_buffer_lifted, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_branch_args_mismatch, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_branch_return_multiple_tensors, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_branch_return_non_tensor, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_mismatch_return_length, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_mismatch_return_tensor_meta, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_missing_args, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_non_list_operands, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_non_tensor_operands, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_unsupported_pred, test/dynamo/test_export.py::ExportTests::test_cond_supported_pred_types, test/dynamo/test_export.py::ExportTests::test_constraint_violation_error_messages, test/dynamo/test_export.py::ExportTests::test_dataclass_input_output, test/dynamo/test_export.py::ExportTests::test_dict_return, test/dynamo/test_export.py::ExportTests::test_dict_return_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes, test/dynamo/test_export.py::ExportTests::test_dupes_2, test/dynamo/test_export.py::ExportTests::test_dupes_2_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_arg, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_arg_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_output, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_output_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dynamic_slicing, test/dynamo/test_export.py::ExportTests::test_dynamic_slicing_simple, test/dynamo/test_export.py::ExportTests::test_dynamo_enum_in_tuple, test/dynamo/test_export.py::ExportTests::test_dynamo_list_index, test/dynamo/test_export.py::ExportTests::test_empty, test/dynamo/test_export.py::ExportTests::test_enforce_equalities, test/dynamo/test_export.py::ExportTests::test_export, test/dynamo/test_export.py::ExportTests::test_export_compare_optimize_with_make_fx, test/dynamo/test_export.py::ExportTests::test_export_cond_in_aten_symbolic, test/dynamo/test_export.py::ExportTests::test_export_control_flow_with_getattr, test/dynamo/test_export.py::ExportTests::test_export_decomp, test/dynamo/test_export.py::ExportTests::test_export_decomp_asserts_bad_args, test/dynamo/test_export.py::ExportTests::test_export_defaults_ok, test/dynamo/test_export.py::ExportTests::test_export_dynamic_control_flow_error, test/dynamo/test_export.py::ExportTests::test_export_dynamic_dim_cleanup, test/dynamo/test_export.py::ExportTests::test_export_dynamic_dim_not_1, test/dynamo/test_export.py::ExportTests::test_export_dynamic_dim_range_constraint, test/dynamo/test_export.py::ExportTests::test_export_graph_bypass, test/dynamo/test_export.py::ExportTests::test_export_graph_bypass_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_graph_with_complex_reorder, test/dynamo/test_export.py::ExportTests::test_export_graph_with_complex_reorder_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_graph_with_list, test/dynamo/test_export.py::ExportTests::test_export_graph_with_list_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_identity, test/dynamo/test_export.py::ExportTests::test_export_masking_with_no_grad, test/dynamo/test_export.py::ExportTests::test_export_meta, test/dynamo/test_export.py::ExportTests::test_export_meta_val, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out_2, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out_2_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_module_specify_constraints_signature, test/dynamo/test_export.py::ExportTests::test_export_multi_dynamic_dim_constraint, test/dynamo/test_export.py::ExportTests::test_export_multi_dynamic_dim_unsafe_relationship, test/dynamo/test_export.py::ExportTests::test_export_nn_module_stack_patched_module, test/dynamo/test_export.py::ExportTests::test_export_no_raise, test/dynamo/test_export.py::ExportTests::test_export_no_tensor_computation_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_pass_arg_by_name, test/dynamo/test_export.py::ExportTests::test_export_pass_arg_by_name_star_args, test/dynamo/test_export.py::ExportTests::test_export_persist_assert, test/dynamo/test_export.py::ExportTests::test_export_preserve_constraints_as_metadata_tensor, test/dynamo/test_export.py::ExportTests::test_export_preserves_nn_module_stack_for_get_attr, test/dynamo/test_export.py::ExportTests::test_export_raise_guard_full_constraint, test/dynamo/test_export.py::ExportTests::test_export_raise_guard_partial_constraint, test/dynamo/test_export.py::ExportTests::test_export_raise_on_relationship, test/dynamo/test_export.py::ExportTests::test_export_shape_control_flow_1, test/dynamo/test_export.py::ExportTests::test_export_specialized_int, test/dynamo/test_export.py::ExportTests::test_export_symbolic_shape, test/dynamo/test_export.py::ExportTests::test_export_with_args_and_empty_kwargs, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_None, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_float, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_tensor, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_tuple, test/dynamo/test_export.py::ExportTests::test_export_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_with_builtin_op_on_assume_constant, test/dynamo/test_export.py::ExportTests::test_export_with_cond_branches_calling_methods, test/dynamo/test_export.py::ExportTests::test_export_with_cond_closure, test/dynamo/test_export.py::ExportTests::test_export_with_cond_dynamic_shape_pred, test/dynamo/test_export.py::ExportTests::test_export_with_cond_with_closed_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_dict_values, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function_and_class_method, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function_and_class_method_multiarg, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function_and_class_method_multiarg_diff, test/dynamo/test_export.py::ExportTests::test_export_with_constant_global_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_in_unspecialized_nn_module, test/dynamo/test_export.py::ExportTests::test_export_with_constant_list_nonzero, test/dynamo/test_export.py::ExportTests::test_export_with_constant_list_nonzero_free_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_method_on_module, test/dynamo/test_export.py::ExportTests::test_export_with_constant_method_on_module_invoke_twice, test/dynamo/test_export.py::ExportTests::test_export_with_constant_none_control_flow, test/dynamo/test_export.py::ExportTests::test_export_with_constant_none_control_flow_free_func, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_none_control_flow, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_none_control_flow_free_func, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_none_control_flow_pos, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_return_const, test/dynamo/test_export.py::ExportTests::test_export_with_constant_tuple_nonzero, test/dynamo/test_export.py::ExportTests::test_export_with_functools_wrapped_fn, test/dynamo/test_export.py::ExportTests::test_export_with_functools_wrapped_method, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_and_empty_args, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_None, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_float, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_tensor, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_tuple, test/dynamo/test_export.py::ExportTests::test_export_with_map_cond, test/dynamo/test_export.py::ExportTests::test_export_with_map_zero_sized_tensor, test/dynamo/test_export.py::ExportTests::test_export_with_map_zero_sized_tensor_suppress_errors, test/dynamo/test_export.py::ExportTests::test_export_with_module_layer, test/dynamo/test_export.py::ExportTests::test_export_with_nonzero_static, test/dynamo/test_export.py::ExportTests::test_export_with_shallow_list_copy_with_side_effects, test/dynamo/test_export.py::ExportTests::test_export_with_shallow_list_copy_wo_side_effects, test/dynamo/test_export.py::ExportTests::test_export_with_stack_trace, test/dynamo/test_export.py::ExportTests::test_export_with_symbool_inputs, test/dynamo/test_export.py::ExportTests::test_export_with_wrapped_fn, test/dynamo/test_export.py::ExportTests::test_exported_graph_serialization, test/dynamo/test_export.py::ExportTests::test_func_return, test/dynamo/test_export.py::ExportTests::test_func_return_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_fx_pytree, test/dynamo/test_export.py::ExportTests::test_immutable_list_dict, test/dynamo/test_export.py::ExportTests::test_input_container_type, test/dynamo/test_export.py::ExportTests::test_input_global, test/dynamo/test_export.py::ExportTests::test_input_global_multiple_access, test/dynamo/test_export.py::ExportTests::test_input_nonlocal, test/dynamo/test_export.py::ExportTests::test_input_unused_nonlocal_ok, test/dynamo/test_export.py::ExportTests::test_list_contains, test/dynamo/test_export.py::ExportTests::test_list_not_contains, test/dynamo/test_export.py::ExportTests::test_list_unpack, test/dynamo/test_export.py::ExportTests::test_list_unpack_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_map_cond_param_buffer_lifted, test/dynamo/test_export.py::ExportTests::test_mixed_real_and_fake_inputs, test/dynamo/test_export.py::ExportTests::test_multiple_outputs_op_with_evaluator, test/dynamo/test_export.py::ExportTests::test_nested_cond_op_param_buffer_lifted, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation_2, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation_2_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation_fail, test/dynamo/test_export.py::ExportTests::test_not_functionalize, test/dynamo/test_export.py::ExportTests::test_param_buffer_safe_from_mutation_recurse, test/dynamo/test_export.py::ExportTests::test_param_buffer_safe_from_mutation_simple, test/dynamo/test_export.py::ExportTests::test_pre_dispatch_simple, test/dynamo/test_export.py::ExportTests::test_predispatch_with_for_out_dtype, test/dynamo/test_export.py::ExportTests::test_predispatch_with_for_out_dtype_nested, test/dynamo/test_export.py::ExportTests::test_predispatch_with_higher_order, test/dynamo/test_export.py::ExportTests::test_predispatch_with_higher_order_nested, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata_graph_break, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata_inline, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata_recompile, test/dynamo/test_export.py::ExportTests::test_remove_redundant_dynamic_dim_in_error_message, test/dynamo/test_export.py::ExportTests::test_retracibility, test/dynamo/test_export.py::ExportTests::test_retracibility_dict_container_inp_out, test/dynamo/test_export.py::ExportTests::test_retracibility_nested_list_out, test/dynamo/test_export.py::ExportTests::test_round_dynamic_shapes, test/dynamo/test_export.py::ExportTests::test_strict_fake_tensor_prop_real_tensors, test/dynamo/test_export.py::ExportTests::test_subclass_parameters, test/dynamo/test_export.py::ExportTests::test_sum_param, test/dynamo/test_export.py::ExportTests::test_sym_contains, test/dynamo/test_export.py::ExportTests::test_symbolic_tracing_within_fake_mode_with_constraints, test/dynamo/test_export.py::ExportTests::test_symbolic_tracing_within_fake_mode_with_constraints_with_parameters, test/dynamo/test_export.py::ExportTests::test_symbool, test/dynamo/test_export.py::ExportTests::test_torch_inference_mode_ctx, test/dynamo/test_export.py::ExportTests::test_trivial_constraint, test/dynamo/test_export.py::ExportTests::test_uncaptured_higher_order_op_error_not_suppresed, test/dynamo/test_export.py::ExportTests::test_untracked_inputs_in_constraints, test/dynamo/test_export.py::ExportTests::test_zeroes_in_and_out_different_shape_on_test, test/dynamo/test_export.py::ExportTests::test_zeroes_in_and_out_different_shape_on_test_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_zeroes_in_new_shape_scalar_out, test/dynamo/test_export.py::ExportTests::test_zeroes_in_new_shape_scalar_out_permute, test/dynamo/test_export.py::ExportTests::test_zeroes_in_new_shape_scalar_out_permute_dupe_and_bypass, test/dynamo/test_export.py::ExportTestsDeviceCUDA::test_export_fast_binary_broadcast_check_cuda, test/dynamo/test_export.py::ExportTestsDeviceCUDA::test_export_fast_binary_broadcast_check_unbacked_cuda, test/dynamo/test_export.py::ExportTestsDeviceCUDA::test_export_with_parameters_cuda 2025-12-04T13:00:07.0372632Z 2025-12-04T13:00:07.0372843Z Finished dynamo/test_export 1/1 ... [2025-12-04 13:00:07.029110][14846.957407523], took 0.42min 2025-12-04T13:00:07.0578983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_export/dynamo.test_export-b39b0ef66a188fdf.xml 2025-12-04T13:00:07.1568100Z Running dynamo/test_python_dispatcher 1/1 ... [2025-12-04 13:00:07.156570][14847.084869568] 2025-12-04T13:00:07.1568852Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:00:07.1571129Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_python_dispatcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:07.156848] 2025-12-04T13:00:11.4285251Z 2025-12-04T13:00:11.4287485Z dynamo/test_python_dispatcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_dispatcher_1.1_786a010f0f7c3940_.log 2025-12-04T13:00:11.4290386Z Running 6 items in this shard: test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key1, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key2, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key3, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key4, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key_set_guard, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_functorch_interpreter 2025-12-04T13:00:11.4291971Z 2025-12-04T13:00:11.4292206Z Finished dynamo/test_python_dispatcher 1/1 ... [2025-12-04 13:00:11.428114][14851.356408619], took 0.07min 2025-12-04T13:00:11.4568965Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-89e5f4289c609732.xml 2025-12-04T13:00:11.4905676Z Running export/test_swap 1/1 ... [2025-12-04 13:00:11.490332][14851.418631126] 2025-12-04T13:00:11.4906348Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:00:11.4909434Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_swap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:11.490611] 2025-12-04T13:00:17.1644093Z 2025-12-04T13:00:17.1645179Z export/test_swap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_swap_1.1_95c916183e2c0305_.log 2025-12-04T13:00:17.1650667Z Running 20 items in this shard: test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_args, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_nonstrict::test_custom_output, test/export/test_swap.py::TestSwap_nonstrict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_nonstrict::test_nested_leaf, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_with_unused_input, test/export/test_swap.py::TestSwap_strict::test_custom_input_args, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_strict::test_custom_output, test/export/test_swap.py::TestSwap_strict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_strict::test_nested_leaf, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_with_unused_input 2025-12-04T13:00:17.1654961Z 2025-12-04T13:00:17.1655153Z Finished export/test_swap 1/1 ... [2025-12-04 13:00:17.164154][14857.092451423], took 0.09min 2025-12-04T13:00:17.1930037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_swap/export.test_swap-4cc8060b16634bb1.xml 2025-12-04T13:00:17.2268365Z Running export/test_unflatten 1/1 ... [2025-12-04 13:00:17.226565][14857.154864024] 2025-12-04T13:00:17.2269081Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:00:17.2271878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:17.226864] 2025-12-04T13:00:29.4114830Z 2025-12-04T13:00:29.4115978Z export/test_unflatten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_1.1_d630949213564262_.log 2025-12-04T13:00:29.4126075Z Running 29 items in this shard: test/export/test_unflatten.py::TestUnflatten::test_assert_tensor_metadata_stack, test/export/test_unflatten.py::TestUnflatten::test_attr_as_submod_input, test/export/test_unflatten.py::TestUnflatten::test_dedup_sym_size, test/export/test_unflatten.py::TestUnflatten::test_double_nested_submodule, test/export/test_unflatten.py::TestUnflatten::test_duplicate_placeholder, test/export/test_unflatten.py::TestUnflatten::test_fx_trace, test/export/test_unflatten.py::TestUnflatten::test_nested_leaf_non_strict, test/export/test_unflatten.py::TestUnflatten::test_placeholder_and_get_attr_ordering_after_unflattened, test/export/test_unflatten.py::TestUnflatten::test_simple_alias, test/export/test_unflatten.py::TestUnflatten::test_unflatten_buffer_mutation, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_obj, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_tensor, test/export/test_unflatten.py::TestUnflatten::test_unflatten_container_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_eager, test/export/test_unflatten.py::TestUnflatten::test_unflatten_empty_branch, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested_access, test/export/test_unflatten.py::TestUnflatten::test_unflatten_none, test/export/test_unflatten.py::TestUnflatten::test_unflatten_param_list_dict, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_signature, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_with_unused_input, test/export/test_unflatten.py::TestUnflatten::test_unflatten_requires_grad_param, test/export/test_unflatten.py::TestUnflatten::test_unflatten_root_module_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_shared_submodule, test/export/test_unflatten.py::TestUnflatten::test_unflatten_skipped_call_module, test/export/test_unflatten.py::TestUnflatten::test_unflatten_submodule_ordering, test/export/test_unflatten.py::TestUnflatten::test_unflatten_with_inplace_compile, test/export/test_unflatten.py::TestUnflatten::test_unflatten_wrong_input, test/export/test_unflatten.py::TestUnflatten::test_unflattened_module_nodes_has_meta_val 2025-12-04T13:00:29.4132765Z 2025-12-04T13:00:29.4133015Z Finished export/test_unflatten 1/1 ... [2025-12-04 13:00:29.411194][14869.339486878], took 0.20min 2025-12-04T13:00:29.4407434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-9dcd937885307cef.xml 2025-12-04T13:00:29.5289520Z Running dynamo/test_verify_correctness 1/1 ... [2025-12-04 13:00:29.528694][14869.456992509] 2025-12-04T13:00:29.5290145Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:00:29.5292684Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_verify_correctness.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:29.528981] 2025-12-04T13:00:33.5008341Z 2025-12-04T13:00:33.5009272Z dynamo/test_verify_correctness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_verify_correctness_1.1_d2d881eebc4cfc16_.log 2025-12-04T13:00:33.5011342Z Running 4 items in this shard: test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_example_inputs, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_false, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_true, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_torchscript 2025-12-04T13:00:33.5012749Z 2025-12-04T13:00:33.5013071Z Finished dynamo/test_verify_correctness 1/1 ... [2025-12-04 13:00:33.500548][14873.428843712], took 0.07min 2025-12-04T13:00:33.5293579Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-22b40053bc190597.xml 2025-12-04T13:00:33.5551507Z Running dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 13:00:33.554914][14873.483212913] 2025-12-04T13:00:33.5552068Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:00:33.5554536Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_wrap_inductor_compiled_regions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:33.555202] 2025-12-04T13:00:50.9991367Z 2025-12-04T13:00:50.9992658Z dynamo/test_wrap_inductor_compiled_regions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_3fbc3d993fa8b554_.log 2025-12-04T13:00:51.0001192Z Running 18 items in this shard: test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_must_save, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_prefer_recompute, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_wrapper_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_flex_attention, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_config_affects_cache_key, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_default_disabled, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_disabled_not_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_enabled_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_no_dispatch_mode_no_hop_invoked, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_option_type_validation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_per_compilation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_multiple_ops 2025-12-04T13:00:51.0007866Z 2025-12-04T13:00:51.0008142Z Finished dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 13:00:50.998844][14890.927135691], took 0.29min 2025-12-04T13:00:51.0280721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-e50f738759450405.xml 2025-12-04T13:00:51.1085537Z Running dynamo/test_cudagraphs_expandable_segments 1/1 ... [2025-12-04 13:00:51.108271][14891.036570516] 2025-12-04T13:00:51.1086070Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:00:51.1088729Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs_expandable_segments.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:51.108589] 2025-12-04T13:00:56.3821463Z 2025-12-04T13:00:56.3823543Z dynamo/test_cudagraphs_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_expandable_segments_1.1_461bf64d7b370157_.log 2025-12-04T13:00:56.3828472Z Running 8 items in this shard: test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_basic, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_dead_fill, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_dtoh, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_factory, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_htod, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutate_constant, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutate_input, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutated_metadata 2025-12-04T13:00:56.3831258Z 2025-12-04T13:00:56.3831547Z Finished dynamo/test_cudagraphs_expandable_segments 1/1 ... [2025-12-04 13:00:56.381747][14896.309998118], took 0.09min 2025-12-04T13:00:56.4112807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_cudagraphs_expandable_segments/dynamo.test_cudagraphs_expandable_segments-6088bb8977cfc034.xml 2025-12-04T13:00:56.4419725Z Running inductor/test_caching 1/1 ... [2025-12-04 13:00:56.441740][14896.370039306] 2025-12-04T13:00:56.4420175Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:00:56.4422916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_caching.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:56.442046] 2025-12-04T13:02:43.5653375Z 2025-12-04T13:02:43.5654545Z inductor/test_caching 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_caching_1.1_ee4444a2502ab159_.log 2025-12-04T13:02:43.5718685Z Running 169 items in this shard: test/inductor/test_caching.py::ConfigTest::test_versioned_config_env_var_override_enabled_False, test/inductor/test_caching.py::ConfigTest::test_versioned_config_env_var_override_enabled_True, test/inductor/test_caching.py::ConfigTest::test_versioned_config_jk_failure, test/inductor/test_caching.py::ConfigTest::test_versioned_config_oss_default_enabled_False, test/inductor/test_caching.py::ConfigTest::test_versioned_config_oss_default_enabled_True, test/inductor/test_caching.py::ConfigTest::test_versioned_config_version_check_enabled_False, test/inductor/test_caching.py::ConfigTest::test_versioned_config_version_check_enabled_True, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_False_all_compile_context_False, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_False_all_compile_context_True, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_True_all_compile_context_False, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_True_all_compile_context_True, test/inductor/test_caching.py::ContextTest::test_isolation_key_is_distinct, test/inductor/test_caching.py::ContextTest::test_isolation_key_is_repeatable, test/inductor/test_caching.py::ContextTest::test_select_compile_context_matches_forms_of_context, test/inductor/test_caching.py::ContextTest::test_select_runtime_context_matches_forms_of_context, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected9, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected9, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected9, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected9, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CacheError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CustomParamsEncoderRequiredError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CustomResultDecoderRequiredError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CustomResultEncoderRequiredError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_DeterministicCachingRequiresStrongConsistencyError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_FileLockTimeoutError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_KeyEncodingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_KeyPicklingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_LockTimeoutError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_SystemError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_UserError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValueDecodingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValueEncodingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValuePicklingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValueUnPicklingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_other, test/inductor/test_caching.py::ImplementationsTest::test_get_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_get_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_will_not_overwrite_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_will_not_overwrite_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_key_encoding_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_key_encoding_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_decoding_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_decoding_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_encoding_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_encoding_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_version_mismatch_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_version_mismatch_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::InterfacesTest::test_caching_module_disabled_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_caching_module_disabled_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_ischema_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_ischema_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_params_encoder_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_params_encoder_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_result_encoder_and_decoder_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_result_encoder_and_decoder_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_defaults_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_defaults_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_deterministic_caching_disabled, test/inductor/test_caching.py::InterfacesTest::test_params_encoder_required_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_params_encoder_required_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_and_decoder_required_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_and_decoder_required_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_required_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_required_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_strictly_cached_determinism, test/inductor/test_caching.py::InterfacesTest::test_strictly_pre_populated_determinism, test/inductor/test_caching.py::LocksTest::test_BLOCKING, test/inductor/test_caching.py::LocksTest::test_BLOCKING_WITH_TIMEOUT, test/inductor/test_caching.py::LocksTest::test_NON_BLOCKING, test/inductor/test_caching.py::LocksTest::test_acquire_many_impl_locks_with_timeout_impl_typename_combos0, test/inductor/test_caching.py::LocksTest::test_acquire_many_impl_locks_with_timeout_impl_typename_combos1, test/inductor/test_caching.py::LocksTest::test_acquire_many_impl_locks_with_timeout_impl_typename_combos2, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::UtilsTest::test_lru_cache, test/inductor/test_caching.py::UtilsTest::test_try_pickle_key_pickle_able_False, test/inductor/test_caching.py::UtilsTest::test_try_pickle_key_pickle_able_True, test/inductor/test_caching.py::UtilsTest::test_try_pickle_value_pickle_able_False, test/inductor/test_caching.py::UtilsTest::test_try_pickle_value_pickle_able_True, test/inductor/test_caching.py::UtilsTest::test_try_unpickle_value_unpickle_able_False, test/inductor/test_caching.py::UtilsTest::test_try_unpickle_value_unpickle_able_True 2025-12-04T13:02:43.5779318Z 2025-12-04T13:02:43.5779556Z Finished inductor/test_caching 1/1 ... [2025-12-04 13:02:43.565301][15003.493596006], took 1.79min 2025-12-04T13:02:43.5960242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_caching/inductor.test_caching-81c36c30e8c9f16c.xml 2025-12-04T13:02:43.6726708Z Running dynamo/test_reorder_logs 1/1 ... [2025-12-04 13:02:43.672413][15003.600710394] 2025-12-04T13:02:43.6727186Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:02:43.6729621Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_reorder_logs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:02:43.672702] 2025-12-04T13:02:48.2954227Z 2025-12-04T13:02:48.2955093Z dynamo/test_reorder_logs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reorder_logs_1.1_bce7080c1339fb4f_.log 2025-12-04T13:02:48.2960964Z Running 14 items in this shard: test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method0_fn0_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method1_fn1_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method2_fn2_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method3_fn3_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method4_fn4_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method5_fn5_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method6_fn6_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method7_fn7_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_constant_mutation, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_dont_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_custom_log_fn, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print_graph_break, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_warnings 2025-12-04T13:02:48.2965197Z 2025-12-04T13:02:48.2965414Z Finished dynamo/test_reorder_logs 1/1 ... [2025-12-04 13:02:48.295190][15008.223484427], took 0.08min 2025-12-04T13:02:48.3249999Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-8774cf5ade30b7b9.xml 2025-12-04T13:02:48.3578775Z Running dynamo/test_subclasses 1/1 ... [2025-12-04 13:02:48.357633][15008.285931799] 2025-12-04T13:02:48.3579226Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:02:48.3581933Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_subclasses.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:02:48.357931] 2025-12-04T13:03:19.9250150Z 2025-12-04T13:03:19.9255832Z dynamo/test_subclasses 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_subclasses_1.1_0f5f75f18480ff60_.log 2025-12-04T13:03:19.9289730Z Running 126 items in this shard: test/dynamo/test_subclasses.py::SubclassTests::test_as_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_base_torch_function_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_compile_higher_order_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_automatic_dynamic, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_dynamic_dim, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_has_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_make_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_newly_constructed_tensor_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_buffer, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_cat, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_simple, test/dynamo/test_subclasses.py::SubclassTests::test_no_call_to_new, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_on_size_bytecode, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_recompiles, test/dynamo/test_subclasses.py::SubclassTests::test_nontraceable_tensor_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_overridden_method_guarding, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_custom_torch_func_and_dynamic_attr, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_with_old_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_recompile_with_symbool_inputs, test/dynamo/test_subclasses.py::SubclassTests::test_recompiles_with_optional_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_return_as_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_local_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_TwoTensor_TwoTensor, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_nested_diff_sizes, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_constructor_proxying, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_attr, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_method, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_override_shape_and_to, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_parameters_are_static_under_training, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_with_disabled_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_support_bases, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_automatic_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_clone_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_different_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mark_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_nested, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_multiple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_tensor_and_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_simple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_attr_codegen_tos, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_arg_num, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_not_classmethod, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_override, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_recursive_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_custom_attr, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_with_non_classmethod_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_torch_dispatch_subclass_guard_recompile, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_attr, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method_arg, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_list_args, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_guards, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_nested, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_subclass_survives_into_aot_autograd, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class_with_kwargs, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_attr_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_method_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_property_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_dynamo_attribute_access_on_intermediate, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_guards_on_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_differently_sized_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_same_sized_inner_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd_inductor, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_recompiles, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_6, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_is_nested_call, test/dynamo/test_subclasses.py::TestNestedTensor::test_inference_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_inline_nested_tensor_from_jagged, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_dense_subclass_dense_subclass, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_subclass_dense, test/dynamo/test_subclasses.py::TestNestedTensor::test_param_subclass_isinstance_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_return_shape, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_dense_subclass_dense_view, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_gives_static_shapes_when_dynamic_false, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_with_mutation_in_graph, test/dynamo/test_subclasses.py::TestNestedTensor::test_unary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_unbind 2025-12-04T13:03:19.9322043Z 2025-12-04T13:03:19.9322276Z Finished dynamo/test_subclasses 1/1 ... [2025-12-04 13:03:19.925249][15039.853537264], took 0.53min 2025-12-04T13:03:19.9556809Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-debad3a483737c49.xml 2025-12-04T13:03:20.0410574Z Running dynamo/test_comptime 1/1 ... [2025-12-04 13:03:20.040824][15039.969122225] 2025-12-04T13:03:20.0411198Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:03:20.0414023Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_comptime.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:20.041141] 2025-12-04T13:03:29.2728986Z 2025-12-04T13:03:29.2730216Z dynamo/test_comptime 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_comptime_1.1_ee7f00d5f391ca6c_.log 2025-12-04T13:03:29.2733562Z Running 12 items in this shard: test/dynamo/test_comptime.py::ComptimeTests::test_get_local, test/dynamo/test_comptime.py::ComptimeTests::test_get_local_closure_variable, test/dynamo/test_comptime.py::ComptimeTests::test_graph_break, test/dynamo/test_comptime.py::ComptimeTests::test_print_bt, test/dynamo/test_comptime.py::ComptimeTests::test_print_direct, test/dynamo/test_comptime.py::ComptimeTests::test_print_disas, test/dynamo/test_comptime.py::ComptimeTests::test_print_graph, test/dynamo/test_comptime.py::ComptimeTests::test_print_guards, test/dynamo/test_comptime.py::ComptimeTests::test_print_locals, test/dynamo/test_comptime.py::ComptimeTests::test_print_single, test/dynamo/test_comptime.py::ComptimeTests::test_print_value_stack, test/dynamo/test_comptime.py::ComptimeTests::test_sleep 2025-12-04T13:03:29.2735831Z 2025-12-04T13:03:29.2736052Z Finished dynamo/test_comptime 1/1 ... [2025-12-04 13:03:29.272553][15049.200842411], took 0.15min 2025-12-04T13:03:29.3024738Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_comptime/dynamo.test_comptime-47f6d1d4947f2a1a.xml 2025-12-04T13:03:29.3824680Z Running test_privateuseone_python_backend 1/1 ... [2025-12-04 13:03:29.382170][15049.310467862] 2025-12-04T13:03:29.3825479Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:03:29.3827753Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_privateuseone_python_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:29.382467] 2025-12-04T13:03:32.5527714Z 2025-12-04T13:03:32.5528884Z test_privateuseone_python_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_privateuseone_python_backend_1.1_d1fcfb50f1d5d34a_.log 2025-12-04T13:03:32.5530327Z Running 2 items in this shard: test/test_privateuseone_python_backend.py::PrivateUse1BackendTest::test_accessing_is_pinned, test/test_privateuseone_python_backend.py::PrivateUse1BackendTest::test_backend_simple 2025-12-04T13:03:32.5531124Z 2025-12-04T13:03:32.5531443Z Finished test_privateuseone_python_backend 1/1 ... [2025-12-04 13:03:32.552378][15052.480667337], took 0.05min 2025-12-04T13:03:32.5829144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_privateuseone_python_backend/test_privateuseone_python_backend-af92c89cf1e734fe.xml 2025-12-04T13:03:32.6162381Z Running functorch/test_rearrange 1/1 ... [2025-12-04 13:03:32.615972][15052.544270333] 2025-12-04T13:03:32.6163159Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:03:32.6165967Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_rearrange.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:32.616280] 2025-12-04T13:03:35.8864991Z 2025-12-04T13:03:35.8866069Z functorch/test_rearrange 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_rearrange_1.1_f58f64fb207f9ea4_.log 2025-12-04T13:03:35.8869563Z Running 10 items in this shard: test/functorch/test_rearrange.py::TestRearrange::test_0_dim_tensor, test/functorch/test_rearrange.py::TestRearrange::test_collapsed_ellipsis_errors_out, test/functorch/test_rearrange.py::TestRearrange::test_concatenations_and_stacking, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_no_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_with_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_ellipsis_ops, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_consistency, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_permutations, test/functorch/test_rearrange.py::TestRearrange::test_squeeze, test/functorch/test_rearrange.py::TestRearrange::test_unsqueeze 2025-12-04T13:03:35.8872432Z 2025-12-04T13:03:35.8872708Z Finished functorch/test_rearrange 1/1 ... [2025-12-04 13:03:35.886225][15055.814516856], took 0.05min 2025-12-04T13:03:35.9162739Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-2881d42f49f0d5f4.xml 2025-12-04T13:03:35.9633621Z Running functorch/test_parsing 1/1 ... [2025-12-04 13:03:35.963115][15055.891414039] 2025-12-04T13:03:35.9634389Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:03:35.9637295Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_parsing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:35.963419] 2025-12-04T13:03:39.1839226Z 2025-12-04T13:03:39.1841015Z functorch/test_parsing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_parsing_1.1_ab4f4097a5615c37_.log 2025-12-04T13:03:39.1845980Z Running 12 items in this shard: test/functorch/test_parsing.py::TestAnonymousAxis::test_anonymous_axes, test/functorch/test_parsing.py::TestParsedExpression::test_elementary_axis_name, test/functorch/test_parsing.py::TestParsedExpression::test_invalid_expressions, test/functorch/test_parsing.py::TestParsedExpression::test_parse_expression, test/functorch/test_parsing.py::TestParsingUtils::test_ellipsis_invalid_identifier, test/functorch/test_parsing.py::TestParsingUtils::test_ellipsis_matching, test/functorch/test_parsing.py::TestParsingUtils::test_left_parenthesized_ellipsis, test/functorch/test_parsing.py::TestParsingUtils::test_parse_pattern_number_of_arrows, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_identifier_mismatch, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_non_unitary_anonymous_axes_raises_error, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_unexpected_axes_lengths, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_validate_axes_lengths_are_integers 2025-12-04T13:03:39.1849520Z 2025-12-04T13:03:39.1849762Z Finished functorch/test_parsing 1/1 ... [2025-12-04 13:03:39.183430][15059.111719469], took 0.05min 2025-12-04T13:03:39.2147309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_parsing/functorch.test_parsing-b19da695e97869b8.xml 2025-12-04T13:03:39.2441926Z Running test_varlen_attention 1/1 ... [2025-12-04 13:03:39.243955][15059.172253813] 2025-12-04T13:03:39.2442643Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:03:39.2445650Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_varlen_attention.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:39.244252] 2025-12-04T13:03:45.1190593Z 2025-12-04T13:03:45.1191647Z test_varlen_attention 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_varlen_attention_1.1_7b4553019fa7c3b1_.log 2025-12-04T13:03:45.1199722Z Running 22 items in this shard: test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_basic_functionality_bfloat16_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_basic_functionality_float16_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_False_num_perms_1_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_False_num_perms_3_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_False_num_perms_5_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_True_num_perms_1_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_True_num_perms_3_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_True_num_perms_5_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_False_num_perms_1_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_False_num_perms_3_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_False_num_perms_5_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_True_num_perms_1_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_True_num_perms_3_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_True_num_perms_5_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_compliance_bfloat16_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_compliance_float16_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_registration_bfloat16_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_registration_float16_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_bfloat16_is_causal_False_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_bfloat16_is_causal_True_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_float16_is_causal_False_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_float16_is_causal_True_cuda_float16 2025-12-04T13:03:45.1207416Z 2025-12-04T13:03:45.1207630Z Finished test_varlen_attention 1/1 ... [2025-12-04 13:03:45.118676][15065.046967557], took 0.10min 2025-12-04T13:03:45.1500679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_varlen_attention/test_varlen_attention-c77e9d85fce2d7de.xml 2025-12-04T13:03:45.1823625Z Running test_mkl_verbose 1/1 ... [2025-12-04 13:03:45.182114][15065.110413069] 2025-12-04T13:03:45.1824042Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:03:45.1826894Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkl_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:45.182421] 2025-12-04T13:03:51.8089096Z 2025-12-04T13:03:51.8090029Z test_mkl_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkl_verbose_1.1_8c6cbee907023829_.log 2025-12-04T13:03:51.8091076Z Running 2 items in this shard: test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_off, test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_on 2025-12-04T13:03:51.8091637Z 2025-12-04T13:03:51.8091869Z Finished test_mkl_verbose 1/1 ... [2025-12-04 13:03:51.808552][15071.736843761], took 0.11min 2025-12-04T13:03:51.8394672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-aebaabaedf0418a4.xml 2025-12-04T13:03:51.9189274Z Running test_cpp_api_parity 1/1 ... [2025-12-04 13:03:51.918674][15071.846973157] 2025-12-04T13:03:51.9189939Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:03:51.9192595Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_api_parity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:51.919001] 2025-12-04T13:04:08.2115668Z 2025-12-04T13:04:08.2116504Z test_cpp_api_parity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_api_parity_1.1_c465bf1c09b2866a_.log 2025-12-04T13:04:08.2244997Z Running 488 items in this shard: test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special_cuda 2025-12-04T13:04:08.2369084Z 2025-12-04T13:04:08.2369418Z Finished test_cpp_api_parity 1/1 ... [2025-12-04 13:04:08.212067][15088.140362196], took 0.27min 2025-12-04T13:04:08.2448992Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-b67630ba146aa7a3.xml 2025-12-04T13:04:08.3302730Z Running test_autoload 1/1 ... [2025-12-04 13:04:08.330017][15088.258315537] 2025-12-04T13:04:08.3303150Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:04:08.3305840Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autoload.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:08.330314] 2025-12-04T13:04:11.5005818Z 2025-12-04T13:04:11.5006580Z test_autoload 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autoload_1.1_7ada199d937afcb8_.log 2025-12-04T13:04:11.5007479Z Running 1 items in this shard: test/test_autoload.py::TestDeviceBackendAutoload::test_autoload 2025-12-04T13:04:11.5007864Z 2025-12-04T13:04:11.5008112Z Finished test_autoload 1/1 ... [2025-12-04 13:04:11.500258][15091.428550028], took 0.05min 2025-12-04T13:04:11.5324687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autoload/test_autoload-f8ddaf02f0fba12a.xml 2025-12-04T13:04:11.5592470Z Running nn/attention/test_open_registry 1/1 ... [2025-12-04 13:04:11.559008][15091.487307781] 2025-12-04T13:04:11.5592965Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:04:11.5595803Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/attention/test_open_registry.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:11.559305] 2025-12-04T13:04:14.7796063Z 2025-12-04T13:04:14.7797293Z nn/attention/test_open_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.attention.test_open_registry_1.1_7d49315786a6f063_.log 2025-12-04T13:04:14.7798822Z Running 2 items in this shard: test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_activate_unknown_impl_errors, test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_register_and_activate_impl 2025-12-04T13:04:14.7799839Z 2025-12-04T13:04:14.7800293Z Finished nn/attention/test_open_registry 1/1 ... [2025-12-04 13:04:14.779312][15094.707602895], took 0.05min 2025-12-04T13:04:14.8114325Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-090bcad7e8d69cda.xml 2025-12-04T13:04:14.8400658Z Running xpu/test_fusion 1/1 ... [2025-12-04 13:04:14.839824][15094.768123868] 2025-12-04T13:04:14.8401123Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:04:14.8403903Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:14.840123] 2025-12-04T13:04:17.9405522Z 2025-12-04T13:04:17.9406317Z xpu/test_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_fusion_1.1_008b7f4febfa87c6_.log 2025-12-04T13:04:17.9406994Z Running 0 items in this shard: 2025-12-04T13:04:17.9407176Z 2025-12-04T13:04:17.9407680Z Finished xpu/test_fusion 1/1 ... [2025-12-04 13:04:17.940337][15097.868634219], took 0.05min 2025-12-04T13:04:17.9727884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/xpu.test_fusion/xpu.test_fusion-5eb21ed31dcf28c7.xml 2025-12-04T13:04:17.9972827Z Running test_foreach 1/1 ... [2025-12-04 13:04:17.997079][15097.925378126] 2025-12-04T13:04:17.9973429Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:04:17.9976219Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_foreach.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:17.997365] 2025-12-04T13:14:09.9922552Z 2025-12-04T13:14:09.9925418Z test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_foreach_1.1_76ff14b11afb6f5e_.log 2025-12-04T13:14:10.0914687Z Running 3577 items in this shard: test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_cpu_ok_cuda, test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_exception_cuda, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_norm_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_div_reciprocal_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_check_stride_ignore_dims_of_one_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes_large_input_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_tensors_grouping_cuda, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_uint8 2025-12-04T13:14:10.1875181Z 2025-12-04T13:14:10.1875553Z Finished test_foreach 1/1 ... [2025-12-04 13:14:09.996568][15689.924858912], took 9.87min 2025-12-04T13:14:10.1876231Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_foreach/test_foreach-08a78fa61936b219.xml 2025-12-04T13:14:10.1876850Z Running test_pytree 1/1 ... [2025-12-04 13:14:10.159554][15690.087851271] 2025-12-04T13:14:10.1877235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:14:10.1878017Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_pytree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:10.159874] 2025-12-04T13:14:14.9826764Z 2025-12-04T13:14:14.9828324Z test_pytree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_pytree_1.1_d855357a70a16b8f_.log 2025-12-04T13:14:14.9850165Z Running 100 items in this shard: test/test_pytree.py::TestGenericPytree::test_aligned_public_apis, test/test_pytree.py::TestGenericPytree::test_broadcast_to_and_flatten_cxx, test/test_pytree.py::TestGenericPytree::test_broadcast_to_and_flatten_python, test/test_pytree.py::TestGenericPytree::test_enum_treespec_roundtrip_cxx, test/test_pytree.py::TestGenericPytree::test_enum_treespec_roundtrip_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_defaultdict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_defaultdict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_deque_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_deque_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_dict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_dict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_leaf_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_leaf_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_list_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_list_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_namedtuple_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_namedtuple_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_nested_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_nested_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_ordereddict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_ordereddict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_max_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_max_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_min_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_min_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_tuple_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_tuple_python, test/test_pytree.py::TestGenericPytree::test_flatten_with_is_leaf_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_with_is_leaf_python, test/test_pytree.py::TestGenericPytree::test_is_namedtuple_cxx, test/test_pytree.py::TestGenericPytree::test_is_namedtuple_python, test/test_pytree.py::TestGenericPytree::test_is_structseq_cxx, test/test_pytree.py::TestGenericPytree::test_is_structseq_python, test/test_pytree.py::TestGenericPytree::test_pytree_serialize_bad_input_cxx, test/test_pytree.py::TestGenericPytree::test_pytree_serialize_bad_input_python, test/test_pytree.py::TestGenericPytree::test_register_pytree_node_cxx, test/test_pytree.py::TestGenericPytree::test_register_pytree_node_python, test/test_pytree.py::TestGenericPytree::test_tree_all_any_cxx, test/test_pytree.py::TestGenericPytree::test_tree_all_any_python, test/test_pytree.py::TestGenericPytree::test_tree_map_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_dict_order_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_dict_order_python, test/test_pytree.py::TestGenericPytree::test_tree_map_multi_inputs_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_multi_inputs_python, test/test_pytree.py::TestGenericPytree::test_tree_map_only_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_only_predicate_fn_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_only_predicate_fn_python, test/test_pytree.py::TestGenericPytree::test_tree_map_only_python, test/test_pytree.py::TestGenericPytree::test_tree_map_python, test/test_pytree.py::TestPythonPytree::test_constant, test/test_pytree.py::TestPythonPytree::test_constant_default_eq_error, test/test_pytree.py::TestPythonPytree::test_constant_default_hash_error, test/test_pytree.py::TestPythonPytree::test_dataclass, test/test_pytree.py::TestPythonPytree::test_deprecated_register_pytree_node, test/test_pytree.py::TestPythonPytree::test_flatten_flatten_with_key_consistency, test/test_pytree.py::TestPythonPytree::test_import_pytree_doesnt_import_optree, test/test_pytree.py::TestPythonPytree::test_key_access, test/test_pytree.py::TestPythonPytree::test_key_str, test/test_pytree.py::TestPythonPytree::test_pytree_context_serialize_bad, test/test_pytree.py::TestPythonPytree::test_pytree_custom_type_serialize, test/test_pytree.py::TestPythonPytree::test_pytree_custom_type_serialize_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_bad_protocol, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_defaultdict_enum, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_enum, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_namedtuple, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_namedtuple_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_register_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec0, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec1, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec2, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec3, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec4, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec5, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec6, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec7, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec8, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec9, test/test_pytree.py::TestPythonPytree::test_register_dataclass_class, test/test_pytree.py::TestPythonPytree::test_saved_serialized, test/test_pytree.py::TestPythonPytree::test_tree_flatten_with_path_is_leaf, test/test_pytree.py::TestPythonPytree::test_tree_flatten_with_path_roundtrip, test/test_pytree.py::TestPythonPytree::test_tree_leaves_with_path, test/test_pytree.py::TestPythonPytree::test_tree_map_with_path, test/test_pytree.py::TestPythonPytree::test_tree_map_with_path_multiple_trees, test/test_pytree.py::TestPythonPytree::test_treespec_equality, test/test_pytree.py::TestPythonPytree::test_treespec_repr, test/test_pytree.py::TestCxxPytree::test_pytree_custom_type_serialize, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_namedtuple, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec0, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec1, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec2, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec3, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec4, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec5, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec6, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec7, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec8, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec9, test/test_pytree.py::TestCxxPytree::test_treespec_equality, test/test_pytree.py::TestCxxPytree::test_treespec_repr 2025-12-04T13:14:14.9869570Z 2025-12-04T13:14:14.9869749Z Finished test_pytree 1/1 ... [2025-12-04 13:14:14.982529][15694.910819056], took 0.08min 2025-12-04T13:14:15.0148129Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_pytree/test_pytree-863a0f55639901d8.xml 2025-12-04T13:14:15.0507229Z Running test_namedtuple_return_api 1/1 ... [2025-12-04 13:14:15.050473][15694.978770887] 2025-12-04T13:14:15.0507839Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:14:15.0510926Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_namedtuple_return_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:15.050810] 2025-12-04T13:14:19.4230176Z 2025-12-04T13:14:19.4231140Z test_namedtuple_return_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_namedtuple_return_api_1.1_7b0afa1d6fad9127_.log 2025-12-04T13:14:19.4232757Z Running 3 items in this shard: test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_import_return_types, test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_namedtuple_return, test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_native_functions_yaml 2025-12-04T13:14:19.4233847Z 2025-12-04T13:14:19.4234151Z Finished test_namedtuple_return_api 1/1 ... [2025-12-04 13:14:19.422701][15699.350990319], took 0.07min 2025-12-04T13:14:19.4553955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_namedtuple_return_api/test_namedtuple_return_api-8cf83ed9877ffdac.xml 2025-12-04T13:14:19.4853017Z Running profiler/test_record_function 1/1 ... [2025-12-04 13:14:19.485054][15699.413352889] 2025-12-04T13:14:19.4853485Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:14:19.4856214Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_record_function.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:19.485362] 2025-12-04T13:14:22.7054564Z 2025-12-04T13:14:22.7055498Z profiler/test_record_function 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_record_function_1.1_27a3a5f054eb4856_.log 2025-12-04T13:14:22.7058306Z Running 6 items in this shard: test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_delegation_with_profiler, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function_fork, test/profiler/test_record_function.py::TestRecordFunction::test_python_dispatch_mode_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_python_subclass_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_record_function 2025-12-04T13:14:22.7060291Z 2025-12-04T13:14:22.7060529Z Finished profiler/test_record_function 1/1 ... [2025-12-04 13:14:22.705222][15702.633517556], took 0.05min 2025-12-04T13:14:22.7376319Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_record_function/profiler.test_record_function-709377f3af2db71e.xml 2025-12-04T13:14:22.7671487Z Running test_compile_benchmark_util 1/1 ... [2025-12-04 13:14:22.766897][15702.695195294] 2025-12-04T13:14:22.7671969Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:14:22.7674848Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_compile_benchmark_util.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:22.767212] 2025-12-04T13:14:30.4528161Z 2025-12-04T13:14:30.4529195Z test_compile_benchmark_util 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_compile_benchmark_util_1.1_37f34d6b957df075_.log 2025-12-04T13:14:30.4530667Z Running 1 items in this shard: test/test_compile_benchmark_util.py::TestCompileBenchmarkUtil::test_training_and_inference 2025-12-04T13:14:30.4531143Z 2025-12-04T13:14:30.4531424Z Finished test_compile_benchmark_util 1/1 ... [2025-12-04 13:14:30.452455][15710.380745301], took 0.13min 2025-12-04T13:14:30.4859731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_compile_benchmark_util/test_compile_benchmark_util-827ec108afd6e51f.xml 2025-12-04T13:14:30.5698589Z Running test_set_default_mobile_cpu_allocator 1/1 ... [2025-12-04 13:14:30.569585][15710.497883589] 2025-12-04T13:14:30.5699137Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:14:30.5702062Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_set_default_mobile_cpu_allocator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:30.569947] 2025-12-04T13:14:33.7403995Z 2025-12-04T13:14:33.7404983Z test_set_default_mobile_cpu_allocator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_set_default_mobile_cpu_allocator_1.1_d39ef648b820daa5_.log 2025-12-04T13:14:33.7406516Z Running 2 items in this shard: test/test_set_default_mobile_cpu_allocator.py::TestSetDefaultMobileCPUAllocator::test_exception, test/test_set_default_mobile_cpu_allocator.py::TestSetDefaultMobileCPUAllocator::test_no_exception 2025-12-04T13:14:33.7407456Z 2025-12-04T13:14:33.7407783Z Finished test_set_default_mobile_cpu_allocator 1/1 ... [2025-12-04 13:14:33.740075][15713.66836338], took 0.05min 2025-12-04T13:14:33.7728034Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_set_default_mobile_cpu_allocator/test_set_default_mobile_cpu_allocator-a62b10a07a12c95d.xml 2025-12-04T13:14:33.8048744Z Running test_fake_tensor 1/1 ... [2025-12-04 13:14:33.804637][15713.732936068] 2025-12-04T13:14:33.8049190Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:14:33.8051921Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fake_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:33.804939] 2025-12-04T13:14:56.7583172Z 2025-12-04T13:14:56.7584025Z test_fake_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fake_tensor_1.1_e61dd7d666716729_.log 2025-12-04T13:14:56.7660468Z Running 288 items in this shard: test/test_fake_tensor.py::FakeTensorTest::test__adaptive_avg_pool2d_backward, test/test_fake_tensor.py::FakeTensorTest::test_alias_call, test/test_fake_tensor.py::FakeTensorTest::test_allow_meta, test/test_fake_tensor.py::FakeTensorTest::test_aten_copy_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_aten_index_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_aten_slice_scatter_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_basic, test/test_fake_tensor.py::FakeTensorTest::test_batch_tensor, test/test_fake_tensor.py::FakeTensorTest::test_binary_op_type_promotion, test/test_fake_tensor.py::FakeTensorTest::test_constructor, test/test_fake_tensor.py::FakeTensorTest::test_conv_nhwc, test/test_fake_tensor.py::FakeTensorTest::test_convert_fake_to_real, test/test_fake_tensor.py::FakeTensorTest::test_cpu_fallback, test/test_fake_tensor.py::FakeTensorTest::test_cuda_initialized, test/test_fake_tensor.py::FakeTensorTest::test_cuda_lstm, test/test_fake_tensor.py::FakeTensorTest::test_cudnn_rnn_with_fallback, test/test_fake_tensor.py::FakeTensorTest::test_cudnn_rnn_without_fallback, test/test_fake_tensor.py::FakeTensorTest::test_custom_op_fallback, test/test_fake_tensor.py::FakeTensorTest::test_data_dependent_operator, test/test_fake_tensor.py::FakeTensorTest::test_deepcopy, test/test_fake_tensor.py::FakeTensorTest::test_device_inplace_copy, test/test_fake_tensor.py::FakeTensorTest::test_embedding_bag_meta, test/test_fake_tensor.py::FakeTensorTest::test_export_numpy, test/test_fake_tensor.py::FakeTensorTest::test_fake_device, test/test_fake_tensor.py::FakeTensorTest::test_fake_dispatch_keys, test/test_fake_tensor.py::FakeTensorTest::test_fake_grad_copy, test/test_fake_tensor.py::FakeTensorTest::test_fake_mode_error, test/test_fake_tensor.py::FakeTensorTest::test_fast_div, test/test_fake_tensor.py::FakeTensorTest::test_fast_div_int_to_float, test/test_fake_tensor.py::FakeTensorTest::test_from_numpy, test/test_fake_tensor.py::FakeTensorTest::test_fsdp_flat_param, test/test_fake_tensor.py::FakeTensorTest::test_full, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_complex128, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_complex64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float32, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fn, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fnuz, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e5m2, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e5m2fnuz, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int16, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int32, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int8, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_uint8, test/test_fake_tensor.py::FakeTensorTest::test_index_put_error, test/test_fake_tensor.py::FakeTensorTest::test_jagged_fake_to_fake_preserved, test/test_fake_tensor.py::FakeTensorTest::test_like_constructor, test/test_fake_tensor.py::FakeTensorTest::test_mixed_real_and_fake_inputs, test/test_fake_tensor.py::FakeTensorTest::test_mode, test/test_fake_tensor.py::FakeTensorTest::test_nan_to_num, test/test_fake_tensor.py::FakeTensorTest::test_nanmean_out, test/test_fake_tensor.py::FakeTensorTest::test_new, test/test_fake_tensor.py::FakeTensorTest::test_no_tag_func, test/test_fake_tensor.py::FakeTensorTest::test_non_kwarg_device, test/test_fake_tensor.py::FakeTensorTest::test_non_overlapping_stride_zero, test/test_fake_tensor.py::FakeTensorTest::test_non_parameter_grad, test/test_fake_tensor.py::FakeTensorTest::test_normalize_device, test/test_fake_tensor.py::FakeTensorTest::test_op_with_zero_dim_bypassed, test/test_fake_tensor.py::FakeTensorTest::test_out_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_parameter_instantiation, test/test_fake_tensor.py::FakeTensorTest::test_parameter_view, test/test_fake_tensor.py::FakeTensorTest::test_print_in_fake_mode, test/test_fake_tensor.py::FakeTensorTest::test_randperm, test/test_fake_tensor.py::FakeTensorTest::test_recursive_invocation, test/test_fake_tensor.py::FakeTensorTest::test_repr, test/test_fake_tensor.py::FakeTensorTest::test_same_shape_env_preserved, test/test_fake_tensor.py::FakeTensorTest::test_scalar_inputs, test/test_fake_tensor.py::FakeTensorTest::test_scan_reverse_False, test/test_fake_tensor.py::FakeTensorTest::test_scan_reverse_True, test/test_fake_tensor.py::FakeTensorTest::test_setitem, test/test_fake_tensor.py::FakeTensorTest::test_shape_take_not_device, test/test_fake_tensor.py::FakeTensorTest::test_split_return_self, test/test_fake_tensor.py::FakeTensorTest::test_throw, test/test_fake_tensor.py::FakeTensorTest::test_tolist, test/test_fake_tensor.py::FakeTensorTest::test_type_as, test/test_fake_tensor.py::FakeTensorTest::test_unbind_copy_out, test/test_fake_tensor.py::FakeTensorTest::test_unsqueeze_copy, test/test_fake_tensor.py::FakeTensorTest::test_upsample_bilinear_small_channels, test/test_fake_tensor.py::FakeTensorTest::test_zero_dim, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test__adaptive_avg_pool2d_backward_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_alias_call_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_allow_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_copy_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_index_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_slice_scatter_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_basic_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_batch_tensor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_binary_op_type_promotion_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_constructor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_conv_nhwc_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_convert_fake_to_real_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cpu_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cuda_initialized_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cuda_lstm_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cudnn_rnn_with_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cudnn_rnn_without_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_custom_op_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_data_dependent_operator_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_deepcopy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_device_inplace_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_embedding_bag_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_export_numpy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_dispatch_keys_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_grad_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_mode_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fast_div_int_to_float_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fast_div_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_from_numpy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fsdp_flat_param_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_full_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_complex128_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_complex64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float32_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fn_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fnuz_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e5m2_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e5m2fnuz_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int16_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int32_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int8_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_uint8_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_put_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_jagged_fake_to_fake_preserved_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_like_constructor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_mixed_real_and_fake_inputs_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_nan_to_num_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_nanmean_out_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_new_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_no_tag_func_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_kwarg_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_overlapping_stride_zero_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_parameter_grad_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_normalize_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_op_with_zero_dim_bypassed_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_out_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_parameter_instantiation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_parameter_view_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_print_in_fake_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_randperm_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_recursive_invocation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_repr_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_same_shape_env_preserved_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scalar_inputs_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scan_reverse_False_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scan_reverse_True_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_setitem_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_shape_take_not_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_split_return_self_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_throw_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_tolist_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_type_as_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_unbind_copy_out_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_unsqueeze_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_upsample_bilinear_small_channels_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_zero_dim_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorConstHandling::test_aliased_const_write, test/test_fake_tensor.py::FakeTensorConstHandling::test_constant_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_constant_propagate_through_functions, test/test_fake_tensor.py::FakeTensorConstHandling::test_fake_tensor_batch_norm_cpu, test/test_fake_tensor.py::FakeTensorConstHandling::test_fake_tensor_in_intlist_repro, test/test_fake_tensor.py::FakeTensorConstHandling::test_inplace_add, test/test_fake_tensor.py::FakeTensorConstHandling::test_inplace_view_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_shared_storage_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_shared_storages, test/test_fake_tensor.py::FakeTensorConstHandling::test_simple, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_aliased_const_write_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_constant_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_constant_propagate_through_functions_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_fake_tensor_batch_norm_cpu_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_fake_tensor_in_intlist_repro_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_inplace_add_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_inplace_view_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_shared_storage_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_shared_storages_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_simple_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyCatCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyCubeCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyMulCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyMulScalarCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyNMSCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyNonzeroCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySortCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySplitCopyCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyTakeCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyViewCopyCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorConverterTest::test_dead_key, test/test_fake_tensor.py::FakeTensorConverterTest::test_dead_weak_ref, test/test_fake_tensor.py::FakeTensorConverterTest::test_memoized_conversion_from_meta, test/test_fake_tensor.py::FakeTensorConverterTest::test_memoized_conversion_to_meta, test/test_fake_tensor.py::FakeTensorConverterTest::test_multiple_modes, test/test_fake_tensor.py::FakeTensorConverterTest::test_no_active_mode, test/test_fake_tensor.py::FakeTensorConverterTest::test_no_ref_cycle, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_mode_error, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_tensor_storages_non_view, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_tensor_storages_view, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_dead_key_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_dead_weak_ref_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_memoized_conversion_from_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_memoized_conversion_to_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_multiple_modes_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_no_active_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_no_ref_cycle_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_mode_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_tensor_storages_non_view_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_tensor_storages_view_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_conv_c1_backward, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_cross_entropy_loss, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_embedding_bag_private, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_fake_gpu_no_init, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_flash_attention, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_like_ops, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_module_to, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_move_meta_tensor, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_move_module_under_fake, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_no_dispatch_with_like_function, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_non_kwarg_only_device, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_sparse_new, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_str_storage, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_tensor_constructors_all_have_kwarg_device, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_tensor_new, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_conv_c1_backward_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_cross_entropy_loss_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_embedding_bag_private_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_fake_gpu_no_init_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_flash_attention_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_like_ops_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_module_to_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_move_meta_tensor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_move_module_under_fake_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_no_dispatch_with_like_function_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_non_kwarg_only_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_sparse_new_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_str_storage_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_tensor_constructors_all_have_kwarg_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_tensor_new_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorPropTest::test_fake_tensor_prop_on_nn_module, test/test_fake_tensor.py::FakeTensorPropTest::test_fake_tensor_prop_on_nn_module_with_optional_args, test/test_fake_tensor.py::FakeTensorPropTest::test_nan_to_num, test/test_fake_tensor.py::FakeTensorPropTest::test_nonzero_stride, test/test_fake_tensor.py::FakeTensorPropTest::test_torch_load_with_fake_mode, test/test_fake_tensor.py::FakeTensorPropTest::test_unbacked_shape_realloc, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_fake_tensor_prop_on_nn_module_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_fake_tensor_prop_on_nn_module_with_optional_args_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_nan_to_num_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_nonzero_stride_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_torch_load_with_fake_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_unbacked_shape_realloc_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorSerialization::test_serialization, test/test_fake_tensor.py::FakeTensorSerialization::test_serialization_with_tracing, test/test_fake_tensor.py::FakeTensorDispatchCache::test__upsample_bilinear2d_aa_backward_dynamic_shapes, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_aten_index, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_bypass, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_default_device, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_default_dtype, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_dispatch_key_set, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_hit, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_inplace_op, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_constants, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_device, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_dtype, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_conj, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_inference, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_neg, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_memory_format, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_requires_grad, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_shape, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_storage_offset, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_stride, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_tuple_outputs, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_view_op, test/test_fake_tensor.py::FakeTensorDispatchCache::test_empty_list, test/test_fake_tensor.py::FakeTensorDispatchCache::test_fft_hfft2_issue145522, test/test_fake_tensor.py::FakeTensorDispatchCache::test_from_buffer, test/test_fake_tensor.py::FakeTensorDispatchCache::test_inference_mode, test/test_fake_tensor.py::FakeTensorDispatchCache::test_invoke_subgraph, test/test_fake_tensor.py::FakeTensorDispatchCache::test_invoke_subgraph_cacheable_inplace, test/test_fake_tensor.py::FakeTensorDispatchCache::test_meta_tensor_to_fake_cpu, test/test_fake_tensor.py::FakeTensorDispatchCache::test_shape_env_settings, test/test_fake_tensor.py::FakeTensorDispatchCache::test_unbacked_output, test/test_fake_tensor.py::FakeTensorDispatchCache::test_wrapper_tensor_subclass_different_device, test/test_fake_tensor.py::FakeTensorPreferDeviceType::test_fake_tensor_prefer_device_type, test/test_fake_tensor.py::FakeTensorPreferDeviceType::test_fake_tensor_prefer_device_type_cpu_only 2025-12-04T13:14:56.7734114Z 2025-12-04T13:14:56.7734320Z Finished test_fake_tensor 1/1 ... [2025-12-04 13:14:56.758572][15736.686865384], took 0.38min 2025-12-04T13:14:56.7914790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fake_tensor/test_fake_tensor-aa30317e19bcd391.xml 2025-12-04T13:14:56.8838776Z Running test_binary_ufuncs 1/1 ... [2025-12-04 13:14:56.883637][15736.81193457] 2025-12-04T13:14:56.8839228Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:14:56.8842418Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_binary_ufuncs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:56.883976] 2025-12-04T13:17:35.6066074Z 2025-12-04T13:17:35.6067100Z test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_binary_ufuncs_1.1_62e3371632b5f5b8_.log 2025-12-04T13:17:35.9829779Z Running 12917 items in this shard: test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_broadcast_empty_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_with_tail_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addcmul_scalars_as_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addsub_half_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_edgecases_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_scalar_device_unspecified_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_ops_with_scalars_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bool_tensor_comparison_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cmul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpu_tensor_pow_cuda_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cremainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_binary_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_inplace_error_msg_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_csub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cuda_tensor_pow_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cumulative_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_script_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divmul_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_exceptions_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_idiv_and_ifloordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_division_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_dunders_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_and_float_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_tensor_pow_neg_ints_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_with_nontrivial_alignment_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_long_tensor_pow_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_forward_ad_float32_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_chalf_tensor_and_cpu_scalar_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_bfloat16_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_out_resize_warning_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_inplace_resizing_exception_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_base_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_overloads_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_overflow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rpow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_typing_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_tensor_pow_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___radd___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rand___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rdiv___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmod___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmul___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___ror___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rpow___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rsub___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rxor___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_return_by_ref_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_max_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_min_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_h_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_he_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_laguerre_polynomial_l_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_legendre_polynomial_p_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_bfloat16_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_gradients_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_uint8 2025-12-04T13:17:36.3341375Z 2025-12-04T13:17:36.3341765Z Finished test_binary_ufuncs 1/1 ... [2025-12-04 13:17:35.624056][15895.552345694], took 2.65min 2025-12-04T13:17:36.3343037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-6264828c6395375f.xml 2025-12-04T13:17:36.3344190Z Running test_meta 2/4 ... [2025-12-04 13:17:35.927331][15895.8556273] 2025-12-04T13:17:36.3344687Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:17:36.3345889Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:17:35.927699] 2025-12-04T13:45:51.4353286Z 2025-12-04T13:45:51.4354368Z test_meta 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_2.4_5af3c72f6896ae42_.log 2025-12-04T13:45:51.6815292Z Running 10352 items in this shard: test/test_meta.py::TestMetaConverter::test_channels_last, test/test_meta.py::TestMetaConverter::test_channels_last_leaf, test/test_meta.py::TestMetaConverter::test_view_dtype, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs__conversions_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs__conversions_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_lengths_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bernoulli_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask1_cuda, test/test_meta.py::TestMetaCUDA::test_inplace_bin_ops_error_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_singular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svdvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_number_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_stride_for_index_Tensor_cuda 2025-12-04T13:45:51.9208215Z 2025-12-04T13:45:51.9208419Z Finished test_meta 2/4 ... [2025-12-04 13:45:51.446836][17591.375127138], took 28.26min 2025-12-04T13:45:51.9209048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_meta/test_meta-be563441d2ad6907.xml 2025-12-04T13:45:53.0403011Z Uploading artifacts took 1.29 seconds 2025-12-04T13:45:53.0406602Z Running test_fx 1/1 ... [2025-12-04 13:45:53.040472][17592.968768044] 2025-12-04T13:45:53.0406944Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:45:53.0410872Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:45:53.040865] 2025-12-04T13:48:51.7397932Z 2025-12-04T13:48:51.7398652Z test_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_1.1_c9e7dc6459df6851_.log 2025-12-04T13:48:51.7726290Z Running 1280 items in this shard: test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cuda, test/test_fx.py::TestCSEPass::test_banned_list, test/test_fx.py::TestCSEPass::test_empty, test/test_fx.py::TestCSEPass::test_immutable_list_multiple_entries, test/test_fx.py::TestCSEPass::test_immutable_list_type, test/test_fx.py::TestCSEPass::test_kwarg, test/test_fx.py::TestCSEPass::test_nested_immutable_list_type, test/test_fx.py::TestCSEPass::test_nochange, test/test_fx.py::TestCSEPass::test_rand_like, test/test_fx.py::TestCSEPass::test_rand_n, test/test_fx.py::TestCSEPass::test_random, test/test_fx.py::TestCSEPass::test_simple, test/test_fx.py::TestCSEPass::test_simple_2, test/test_fx.py::TestCSEPass::test_simple_multiple_same_ops, test/test_fx.py::TestCSEPass::test_two_args, test/test_fx.py::TestCSEPass::test_two_args_default, test/test_fx.py::TestDCE::test_dead_chain, test/test_fx.py::TestDCE::test_dead_getattr, test/test_fx.py::TestDCE::test_dead_placeholder, test/test_fx.py::TestDCE::test_dead_placeholder_with_user, test/test_fx.py::TestDCE::test_impure_custom, test/test_fx.py::TestDCE::test_impure_kwargs, test/test_fx.py::TestDCE::test_impure_nodes_args, test/test_fx.py::TestDCE::test_impure_random, test/test_fx.py::TestDCE::test_keep_collectives, test/test_fx.py::TestDCE::test_keep_collectives_no_overload, test/test_fx.py::TestDCE::test_keep_module_with_side_effects, test/test_fx.py::TestDCE::test_keep_setitem, test/test_fx.py::TestDCE::test_keep_torch_assert, test/test_fx.py::TestDCE::test_simple, test/test_fx.py::TestConstFold::test_check_inline_non_const, test/test_fx.py::TestConstFold::test_check_inline_non_const_mult_return, test/test_fx.py::TestConstFold::test_check_skip_folding_quant_dequant_pattern, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_no_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_placeholder_reordered, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr_three_input, test/test_fx.py::TestConstFold::test_const_fold_has_inlined_call_module_node, test/test_fx.py::TestConstFold::test_const_fold_module_attr, test/test_fx.py::TestConstFold::test_const_fold_multi_const_folded_attrs, test/test_fx.py::TestConstFold::test_const_fold_noop, test/test_fx.py::TestConstFold::test_const_fold_partial_graph, test/test_fx.py::TestConstFold::test_const_fold_submod_hierarchy, test/test_fx.py::TestConstFold::test_const_fold_tensor_meta, test/test_fx.py::TestConstFold::test_const_fold_unused_placeholder, test/test_fx.py::TestConstFold::test_dict_output, test/test_fx.py::TestConstFold::test_do_not_fold_impure_subgraph, test/test_fx.py::TestConstFold::test_fold_module, test/test_fx.py::TestConstFold::test_fold_pure_subgraph, test/test_fx.py::TestConstFold::test_retain_node_meta, test/test_fx.py::TestConstFold::test_three_outputs, test/test_fx.py::TestConstFold::test_two_outputs, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_dim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_ndim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_nelement_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_numel_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_shape_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_size_const, test/test_fx.py::AnnotationsTest::test_annotate, test/test_fx.py::AnnotationsTest::test_annotations, test/test_fx.py::AnnotationsTest::test_broadcasting1, test/test_fx.py::AnnotationsTest::test_broadcasting2, test/test_fx.py::AnnotationsTest::test_broadcasting3, test/test_fx.py::AnnotationsTest::test_consistency, test/test_fx.py::AnnotationsTest::test_precision, test/test_fx.py::TypeCheckerTest::test_flatten_fully_static, test/test_fx.py::TypeCheckerTest::test_resnet50, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast_2, test/test_fx.py::TypeCheckerTest::test_type_check_add_false, test/test_fx.py::TypeCheckerTest::test_type_check_add_true, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_scalar, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_false, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_symbolic, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2_fully_static, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_types, test/test_fx.py::TypeCheckerTest::test_type_check_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_flatten3, test/test_fx.py::TypeCheckerTest::test_type_check_flatten_2, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true_param_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_true, test/test_fx.py::TypeCheckerTest::test_type_check_symbolic_inferenceconv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_False, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_true, test/test_fx.py::TypeCheckerTest::test_type_maxpool2d_fully_static, test/test_fx.py::TypeCheckerTest::test_type_typechecl_maxpool2d_3dinput, test/test_fx.py::TypeCheckerTest::test_typecheck_basicblock, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_function, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_module, test/test_fx.py::TestMatcher::test_split_to_graph_and_name_node_map, test/test_fx.py::TestMatcher::test_subgraph_matcher_ignore_literals, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_attributes, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list_bad, test/test_fx.py::TestMatcher::test_variatic_arg_matching, test/test_fx.py::TestPassManager::test_pass_manager, test/test_fx.py::TestPassManager::test_pass_manager_bad_checks, test/test_fx.py::TestPassManager::test_pass_manager_checks, test/test_fx.py::TestPassManager::test_pass_manager_error, test/test_fx.py::TestPassManager::test_this_before_that_pass_constraint, test/test_fx.py::TestPassManager::test_topological_sort, test/test_fx.py::TestSourceMatcher::test_legalize_slice, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_True, test/test_fx.py::TestSubgraphRewriter::test_matching_pattern_with_list_type_arg, test/test_fx.py::TestSubgraphRewriter::test_matching_variable_arguments, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_callback, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_filters, test/test_fx.py::TestSubgraphRewriter::test_replaced_nodes, test/test_fx.py::TestSubgraphRewriter::test_replacement_with_attrs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_annotations_int, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_call_method, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_internal_pattern_nodes_cannot_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_local_revert, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_multiple_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_nodes_with_kwargs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_is_entire_graph, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_output_pattern_node_can_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_placeholder_matching, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_preserves_logic, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_consecutive_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_duplicated_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_multiple_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replaces_referenced_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_single_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_traced_as_callable, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_oneliner_pattern, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_overlapping_matches, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_trivial_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_args, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_results, test/test_fx.py::TestFX::test_all_input_nodes, test/test_fx.py::TestFX::test_annotation_with_future, test/test_fx.py::TestFX::test_annotations_empty_tuple, test/test_fx.py::TestFX::test_annotations_with_forward_references, test/test_fx.py::TestFX::test_annotations_with_no_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_internal_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_no_internal_forward_references, test/test_fx.py::TestFX::test_args_kwargs, test/test_fx.py::TestFX::test_args_kwargs_no_self, test/test_fx.py::TestFX::test_ast_rewriter_reassigns_submodules, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert_with_message, test/test_fx.py::TestFX::test_ast_rewriter_wrap, test/test_fx.py::TestFX::test_ast_rewriter_wrap_fn_directly, test/test_fx.py::TestFX::test_ast_rewriter_wrap_with_submodule, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_autowrap_functions, test/test_fx.py::TestFX::test_concrete_arg_none_assert, test/test_fx.py::TestFX::test_construct_root_dict, test/test_fx.py::TestFX::test_control_flow_tracing, test/test_fx.py::TestFX::test_copy_it, test/test_fx.py::TestFX::test_copy_no_remap, test/test_fx.py::TestFX::test_ctx_mgr, test/test_fx.py::TestFX::test_custom_codegen, test/test_fx.py::TestFX::test_custom_codegen_with_transformer, test/test_fx.py::TestFX::test_custom_import, test/test_fx.py::TestFX::test_custom_proxy_dynamic_value, test/test_fx.py::TestFX::test_custom_proxy_input_dependent_control_flow, test/test_fx.py::TestFX::test_custom_proxy_type, test/test_fx.py::TestFX::test_custom_proxy_type_literal, test/test_fx.py::TestFX::test_custom_traceback_not_raised_when_exception_source_is_submodule, test/test_fx.py::TestFX::test_custom_traceback_raised_when_exception_source_is_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graph_with_tracer_cls, test/test_fx.py::TestFX::test_deepcopy_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graphmodule_with_transform, test/test_fx.py::TestFX::test_deepcopy_no_recursion, test/test_fx.py::TestFX::test_deepcopy_recursion_depth, test/test_fx.py::TestFX::test_deepcopy_tracer, test/test_fx.py::TestFX::test_deepcopy_with_submods_params, test/test_fx.py::TestFX::test_delete_unused_submodules_leaf, test/test_fx.py::TestFX::test_delete_unused_values, test/test_fx.py::TestFX::test_dict, test/test_fx.py::TestFX::test_direct_param_use, test/test_fx.py::TestFX::test_disallow_override, test/test_fx.py::TestFX::test_ellipsis, test/test_fx.py::TestFX::test_empty_graph_codegen, test/test_fx.py::TestFX::test_enum, test/test_fx.py::TestFX::test_erase_node_error, test/test_fx.py::TestFX::test_example_shape_prop, test/test_fx.py::TestFX::test_find_uses, test/test_fx.py::TestFX::test_fn_type_annotation_empty, test/test_fx.py::TestFX::test_fn_type_annotations, test/test_fx.py::TestFX::test_fx_and_or, test/test_fx.py::TestFX::test_fx_create_arg, test/test_fx.py::TestFX::test_fx_shifts, test/test_fx.py::TestFX::test_fx_stateless, test/test_fx.py::TestFX::test_get_torch_func_signature, test/test_fx.py::TestFX::test_getitem, test/test_fx.py::TestFX::test_getitem_subproc, test/test_fx.py::TestFX::test_graph_edit_with_proxy, test/test_fx.py::TestFX::test_graph_fns, test/test_fx.py::TestFX::test_graph_module, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_dict_init, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_mod_init, test/test_fx.py::TestFX::test_graph_module_replicate_for_dp, test/test_fx.py::TestFX::test_graph_unique_names, test/test_fx.py::TestFX::test_graph_unique_names_manual, test/test_fx.py::TestFX::test_immutable_dict_pytree_ops, test/test_fx.py::TestFX::test_immutable_list_pytree_ops, test/test_fx.py::TestFX::test_imul_code_print, test/test_fx.py::TestFX::test_inf_nan, test/test_fx.py::TestFX::test_inf_nan_kwds, test/test_fx.py::TestFX::test_informative_co_filename, test/test_fx.py::TestFX::test_inline_graph, test/test_fx.py::TestFX::test_insert_arg, test/test_fx.py::TestFX::test_insertion_point, test/test_fx.py::TestFX::test_interpreter, test/test_fx.py::TestFX::test_interpreter_boxed_run_argument_validation, test/test_fx.py::TestFX::test_interpreter_default_args, test/test_fx.py::TestFX::test_interpreter_gc_values, test/test_fx.py::TestFX::test_interpreter_noop_resnet18, test/test_fx.py::TestFX::test_interpreter_not_enough_args, test/test_fx.py::TestFX::test_interpreter_onthefly_swap, test/test_fx.py::TestFX::test_interpreter_other_graph, test/test_fx.py::TestFX::test_interpreter_partial_eval, test/test_fx.py::TestFX::test_interpreter_run_node_override, test/test_fx.py::TestFX::test_interpreter_star_args, test/test_fx.py::TestFX::test_interpreter_with_codegen, test/test_fx.py::TestFX::test_layout, test/test_fx.py::TestFX::test_leaf_module, test/test_fx.py::TestFX::test_lineno_map, test/test_fx.py::TestFX::test_matmul_tracing, test/test_fx.py::TestFX::test_metadata_on_ph, test/test_fx.py::TestFX::test_module_deepcopy_edit_nodes, test/test_fx.py::TestFX::test_move_before, test/test_fx.py::TestFX::test_multi_insert_point, test/test_fx.py::TestFX::test_multiple_default_args, test/test_fx.py::TestFX::test_named_tuple_inlined, test/test_fx.py::TestFX::test_namedtuple_return_qualname, test/test_fx.py::TestFX::test_namedtuple_return_trace, test/test_fx.py::TestFX::test_native_callable, test/test_fx.py::TestFX::test_nn_module_stack, test/test_fx.py::TestFX::test_no_mutation, test/test_fx.py::TestFX::test_node_tagging, test/test_fx.py::TestFX::test_nonetype_annotation, test/test_fx.py::TestFX::test_partial_trace, test/test_fx.py::TestFX::test_pickle_custom_import, test/test_fx.py::TestFX::test_pickle_graphmodule, test/test_fx.py::TestFX::test_pickle_nonetype_annotation, test/test_fx.py::TestFX::test_pickle_torch_custom_ops, test/test_fx.py::TestFX::test_prepend_does_not_leak, test/test_fx.py::TestFX::test_prepend_self, test/test_fx.py::TestFX::test_pretty_print, test/test_fx.py::TestFX::test_pretty_print_graph, test/test_fx.py::TestFX::test_pretty_print_node, test/test_fx.py::TestFX::test_pretty_print_targets, test/test_fx.py::TestFX::test_print_graph, test/test_fx.py::TestFX::test_profiler_multiple_modules, test/test_fx.py::TestFX::test_profiler_nested_graph_modules, test/test_fx.py::TestFX::test_profiler_ranges_side_effect, test/test_fx.py::TestFX::test_profiler_stack_trace_augmentation, test/test_fx.py::TestFX::test_proxy_deepcopy_with_tracer, test/test_fx.py::TestFX::test_proxy_deepcopy_without_tracer, test/test_fx.py::TestFX::test_pytree, test/test_fx.py::TestFX::test_pytree_concrete, test/test_fx.py::TestFX::test_reassign_args_kwargs_uses, test/test_fx.py::TestFX::test_regular_and_default_args, test/test_fx.py::TestFX::test_remove_uses, test/test_fx.py::TestFX::test_remove_uses_with_custom_filter, test/test_fx.py::TestFX::test_replace_input, test/test_fx.py::TestFX::test_replace_uses, test/test_fx.py::TestFX::test_reserved_getattr, test/test_fx.py::TestFX::test_return_tuple, test/test_fx.py::TestFX::test_return_type_exists, test/test_fx.py::TestFX::test_return_type_exists_pre_pep585, test/test_fx.py::TestFX::test_script_method_trace, test/test_fx.py::TestFX::test_script_tensor_constant, test/test_fx.py::TestFX::test_sequential, test/test_fx.py::TestFX::test_shape_prop_aggregate, test/test_fx.py::TestFX::test_shape_prop_layout, test/test_fx.py::TestFX::test_shape_prop_layout_3d, test/test_fx.py::TestFX::test_shape_prop_unbacked_sym, test/test_fx.py::TestFX::test_single_default_arg, test/test_fx.py::TestFX::test_snake_case, test/test_fx.py::TestFX::test_sqrt, test/test_fx.py::TestFX::test_stack_traces, test/test_fx.py::TestFX::test_stack_traces_with_transformer, test/test_fx.py::TestFX::test_string_literal_return, test/test_fx.py::TestFX::test_submodule_manipulation_API, test/test_fx.py::TestFX::test_symbolic_trace_assert, test/test_fx.py::TestFX::test_symbolic_trace_sequential, test/test_fx.py::TestFX::test_tensor_attribute, test/test_fx.py::TestFX::test_tensor_attribute_coalseced, test/test_fx.py::TestFX::test_tensor_constant, test/test_fx.py::TestFX::test_throw_out_variant, test/test_fx.py::TestFX::test_torch_custom_ops, test/test_fx.py::TestFX::test_torch_fx_getattr, test/test_fx.py::TestFX::test_torch_fx_len, test/test_fx.py::TestFX::test_torch_op_overloads, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx_tensor_arg, test/test_fx.py::TestFX::test_trace_buffer_slice, test/test_fx.py::TestFX::test_trace_dict_int_keys, test/test_fx.py::TestFX::test_trace_dict_proxy_keys, test/test_fx.py::TestFX::test_trace_fn_constant, test/test_fx.py::TestFX::test_trace_function, test/test_fx.py::TestFX::test_trace_multiple_funcs, test/test_fx.py::TestFX::test_trace_return_dataclass, test/test_fx.py::TestFX::test_trace_return_dataclass_nested, test/test_fx.py::TestFX::test_trace_return_namedtuple, test/test_fx.py::TestFX::test_tracing_graphmodules_as_leaf_submodules, test/test_fx.py::TestFX::test_transformer_multi_outputs, test/test_fx.py::TestFX::test_transformer_noop, test/test_fx.py::TestFX::test_transformer_op_swap, test/test_fx.py::TestFX::test_transformer_preserves_nn_module_stack_for_get_attr, test/test_fx.py::TestFX::test_tuple_no_subscript, test/test_fx.py::TestFX::test_typename_print, test/test_fx.py::TestFX::test_typename_print_pre_pep585, test/test_fx.py::TestFX::test_typename_print_union, test/test_fx.py::TestFX::test_unpack, test/test_fx.py::TestFX::test_unpack_dict_better_error, test/test_fx.py::TestFX::test_unpack_list_better_error, test/test_fx.py::TestFX::test_update_args_api, test/test_fx.py::TestFX::test_update_args_kwargs_yells_at_you, test/test_fx.py::TestFX::test_update_kwargs_api, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_function, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_module, test/test_fx.py::TestFX::test_varargs_concrete, test/test_fx.py::TestFX::test_wrap, test/test_fx.py::TestFX::test_wrap_decorated_function, test/test_fx.py::TestFX::test_wrap_fn_directly, test/test_fx.py::TestFX::test_wrap_with_submodule, test/test_fx.py::TestFX::test_wrapped_method, test/test_fx.py::TestFX::test_wrapped_retrace, test/test_fx.py::TestFX::test_wrapped_via_decorator, test/test_fx.py::TestFX::test_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_wrong_target_type, test/test_fx.py::TestFX::test_wrong_topo, test/test_fx.py::TestFXAPIBackwardCompatibility::test_adding_side_effect_function, test/test_fx.py::TestFXAPIBackwardCompatibility::test_class_member_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_function_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_preserve_unused_attr_after_unpickle, test/test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_affine_grid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_batch_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy_with_logits, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_tbc, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_similarity, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_ctc_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding_bag, test/test_fx.py::TestFunctionalTracing::test_nn_functional_feature_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gaussian_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_glu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grid_sample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_group_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gumbel_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardswish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hinge_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_huber_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_instance_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_interpolate, test/test_fx.py::TestFunctionalTracing::test_nn_functional_kl_div, test/test_fx.py::TestFunctionalTracing::test_nn_functional_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_layer_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_linear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_local_response_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_log_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_logsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_margin_ranking_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mse_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_head_attention_forward, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_native_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_normalize, test/test_fx.py::TestFunctionalTracing::test_nn_functional_one_hot, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pad, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pairwise_distance, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pdist, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_unshuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_poisson_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_prelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu6, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rms_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_dot_product_attention, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_silu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_smooth_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmin, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softplus, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_with_distance_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_unfold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_nearest, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_H_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_T_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___getitem___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___radd___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rdiv___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmatmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmod___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rpow___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rsub___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__chunk_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__softmax_backward_data_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__upsample_bilinear2d_aa_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_abs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcdiv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_decomposed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_alias_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_all_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_allclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_aminmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_angle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_any_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_arange_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argsort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argwhere_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_baddbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bernoulli_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bfloat16_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_block_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bool_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_shapes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bucketize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_byte_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cartesian_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cauchy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdouble_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ceil_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cfloat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chalf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_char_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_inverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_max_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_min_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clone_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_column_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_combinations_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_physical_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_constant_pad_nd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_contiguous_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_copysign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_corrcoef_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_count_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cov_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumulative_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_deg2rad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_embed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagflat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diff_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_digamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_floor_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_double_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_einsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_permuted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_equal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expm1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eye_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flip_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fliplr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flipud_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gather_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ge_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geometric_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geqrf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gradient_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_half_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hash_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_heaviside_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_histc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hypot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igammac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_inner_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_int_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isfinite_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isnan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isneginf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isposinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isreal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_item_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_2inputs_2outputs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_unary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kron_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kthvalue_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ldexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_le_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lerp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lgamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cond_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_det_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eig_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_householder_product_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_multi_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_slogdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svdvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vander_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vecdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vector_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log10_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log1p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logcumsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_and_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_not_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_or_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_xor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_long_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_unpack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mH_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mT_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matrix_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_pool2d_with_indices_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_maximum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_minimum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_movedim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_msort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_multinomial_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nan_to_num_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmedian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanquantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nansum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_dropout_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ne_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nextafter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_celu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_similarity_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_ctc_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_elu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_glu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardswish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_logsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mse_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_head_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_negative_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_selu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_silu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_static_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_fro_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_inf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_nuc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_in_place_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_number_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ormqr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_outer_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pca_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pinverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polar_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_positive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_quantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rad2deg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rand_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ravel_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_real_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reciprocal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_remainder_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_renorm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_interleave_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize_as__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_roll_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rot90_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scalar_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_searchsorted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sgn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_short_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_bartlett_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hann_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signbit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_mm_reduce_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_airy_ai_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_entr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_erfcx_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i0e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_log_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtri_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_spherical_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_xlog1py_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_zeta_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_list_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_square_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_multiple_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_to_size_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_along_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensor_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensordot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_sparse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_topk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapz_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triangular_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tril_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_true_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trunc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unflatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_uniform_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_consecutive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_where_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_xlogy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zero__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_like_cuda_float32, test/test_fx.py::TestVisionTracing::test_torchvision_models_alexnet, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_base, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_tiny, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet121, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet161, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet169, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet201, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_320_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fcos_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_keypointrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssd300_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssdlite320_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b0, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b1, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b2, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b3, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b4, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b5, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b6, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b7, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_l, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_m, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_googlenet, test/test_fx.py::TestVisionTracing::test_torchvision_models_inception_v3, test/test_fx.py::TestVisionTracing::test_torchvision_models_maxvit_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_75, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_3, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_128gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet152, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet18, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet34, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_32x8d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_64x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext50_32x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_lraspp_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x2_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_1, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mc3_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v1_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r2plus1d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r3d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_s3d, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_h_14, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet101_2, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet50_2 2025-12-04T13:48:51.8041002Z 2025-12-04T13:48:51.8041184Z Finished test_fx 1/1 ... [2025-12-04 13:48:51.741234][17771.669530763], took 2.98min 2025-12-04T13:48:51.8041784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fx/test_fx-26e481d16f3bec04.xml 2025-12-04T13:48:51.9046645Z Running test_ops_gradients 2/4 ... [2025-12-04 13:48:51.904441][17771.832739937] 2025-12-04T13:48:51.9047069Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:48:51.9049930Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:48:51.904759] 2025-12-04T13:54:03.2137401Z 2025-12-04T13:54:03.2138311Z test_ops_gradients 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_2.4_85b1c0ac8f503e20_.log 2025-12-04T13:54:03.2501387Z Running 1350 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_arange_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cdouble_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fmod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_frexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_igammac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_inner_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isneginf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_det_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logcumsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanmedian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_glu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_hardshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_tanhshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_nuc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ormqr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_polar_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_searchsorted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_general_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_take_along_dim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___radd___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmatmul___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bool_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cfloat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cond_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_count_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_equal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_hfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ihfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hash_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_histc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_rank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_long_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_min_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_kl_div_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_linear_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_renorm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_while_loop_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_T_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__native_batch_norm_legit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_angle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_as_strided_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bfloat16_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cauchy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ceil_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_min_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_equal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_gather_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_hstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_igamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_igammac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_imag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_invoke_subgraph_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_singular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_vector_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log1p_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_triple_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_qr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scan_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_legendre_polynomial_p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unbind_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unflatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unfold_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_xlogy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_H_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_alias_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_argwhere_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bfloat16_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cummin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_floor_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_trunc_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_double_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expm1_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_geometric_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_grid_sampler_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isinf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isneginf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorsolve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log10_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_relu6_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_outer_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rand_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ravel_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_roll_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rot90_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_with_sizes_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_true_divide_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_consecutive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rpow___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_decomposed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ceil_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_eq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_grid_sampler_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_igamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_det_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_l1_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_tanhshrink_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_outer_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_reshape_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_decimals_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sgn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_general_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_slice_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unbind_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zeros_cuda_complex128 2025-12-04T13:54:03.2850080Z 2025-12-04T13:54:03.2850310Z Finished test_ops_gradients 2/4 ... [2025-12-04 13:54:03.215429][18083.143720872], took 5.19min 2025-12-04T13:54:03.2851014Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-e4282282966a7ff9.xml 2025-12-04T13:54:03.3537684Z Running test_nestedtensor 3/4 ... [2025-12-04 13:54:03.353540][18083.281838322] 2025-12-04T13:54:03.3538125Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:54:03.3541091Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:54:03.353849] 2025-12-04T14:00:01.7448863Z 2025-12-04T14:00:01.7449810Z test_nestedtensor 3/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_3.4_dff7d83ca5d5d6c7_.log 2025-12-04T14:00:01.7583942Z Running 398 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_is_contiguous, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_zeros_like, test/test_nestedtensor.py::TestNestedTensor::test_unbind_3, test/test_nestedtensor.py::TestNestedInt::test_comparisons, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_masked_select_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_cos_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isposinf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sgn_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sqrt_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_sub_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_indexing_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_edge_case_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_generates_leaf_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_mask_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_to_buffer_series_ops_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_autograd_function_with_None_grad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_with_nested_int_second_arg_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_propagated_dynamic_max_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_in_inference_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_construction_from_list_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_copy__cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_narrow_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_profiler_sequence_nr_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_record_stream_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_autocast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_recompile_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_squeeze_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_1_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_view_ragged_idx_not_one_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hash_tensor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_laguerre_polynomial_l_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_input_mutation_backward_cuda 2025-12-04T14:00:01.7710442Z 2025-12-04T14:00:01.7710664Z Finished test_nestedtensor 3/4 ... [2025-12-04 14:00:01.745452][18441.673744426], took 5.97min 2025-12-04T14:00:01.7791189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-59ab3583fe1f80dc.xml 2025-12-04T14:00:01.8626241Z Running functorch/test_control_flow 4/4 ... [2025-12-04 14:00:01.862370][18441.790667554] 2025-12-04T14:00:01.8627012Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:00:01.8629767Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_control_flow.py', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:00:01.862668] 2025-12-04T14:08:01.2653077Z 2025-12-04T14:08:01.2653994Z functorch/test_control_flow 4/4 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_control_flow_4.4_38ed588b114098c9_.log 2025-12-04T14:08:01.2882400Z Running 478 items in this shard: test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_different_pytree_output, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_grad_through_cond, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_nested, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_pytree_not_all_inputs_used, test/functorch/test_control_flow.py::TestControlFlow::test_cond_in_forloop, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_no_grad_output, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_simple_partial_grad, test/functorch/test_control_flow.py::TestControlFlow::test_map_list_in_out, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_carry_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_none_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_carry_shape, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_False_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_carry_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_mutation, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_1_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_3_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_while_loop_gpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_dynamic_shape_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_autograd_backward, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_eager_run_with_item, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_false_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_true_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_output_alias_input, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_traced_not_nested_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_1, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_module_param_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_tensor_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_elem_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_merge_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_mixed_batch_dims, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_unbatched_x, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_vmap_scan_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_real, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_list, test/functorch/test_control_flow.py::TestControlFlowTraced::test_vmap_vmap_boolcond_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_pytree_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_cpp, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_functorch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_int, test/functorch/test_control_flow.py::TestHopSchema::test_schema_tree_spec, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_GraphModule, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymBool, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymInt, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_additional_inputs 2025-12-04T14:08:01.3090133Z 2025-12-04T14:08:01.3090396Z Finished functorch/test_control_flow 4/4 ... [2025-12-04 14:08:01.273013][18921.201304942], took 7.99min 2025-12-04T14:08:01.3091185Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-5ef9f5430d478fb5.xml 2025-12-04T14:08:02.9517712Z Uploading artifacts took 1.50 seconds 2025-12-04T14:08:02.9520068Z Running complex_tensor/test_complex_tensor 3/3 ... [2025-12-04 14:08:02.951762][18922.880058764] 2025-12-04T14:08:02.9520621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:08:02.9523736Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'complex_tensor/test_complex_tensor.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:08:02.952120] 2025-12-04T14:11:18.7726133Z 2025-12-04T14:11:18.7727980Z complex_tensor/test_complex_tensor 3/3 was successful, full logs can be found in artifacts with path test/test-reports/complex_tensor.test_complex_tensor_3.3_18c49b70545b8444_.log 2025-12-04T14:11:18.7791323Z Running 193 items in this shard: test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_abs_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_abs_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_abs_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_acos_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_acosh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_acosh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_add_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_addmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_addmm_decomposed_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_all_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_any_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_asinh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atan_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atan_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atanh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atanh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_bmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_clone_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_conj_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_conj_physical_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cos_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cosh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cosh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cumprod_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_diagonal_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_div_no_rounding_mode_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_div_no_rounding_mode_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_eq_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_eq_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_eq_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_expm1_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_flatten_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_flip_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_full_like_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_index_add_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_index_select_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_index_select_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_isfinite_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_isinf_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_isnan_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_log_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_log_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_logical_and_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_logical_xor_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_mm_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_mul_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_ne_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_ne_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_new_zeros_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_permute_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_pow_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_prod_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_randn_like_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_real_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_repeat_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_rsqrt_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_rsqrt_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_rsub_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_scatter_add_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_select_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sgn_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sin_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sin_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_slice_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_list_args_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_with_sizes_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sqrt_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_squeeze_multiple_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_stack_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_stack_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sub_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sub_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sum_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sum_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sum_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_tanh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_true_divide_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_var_unbiased_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_where_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_abs_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_acos_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_acosh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_asin_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_asinh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_atan_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_atanh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_conj_physical_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_cos_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_cumsum_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_dot_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_gather_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_imag_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_log1p_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_mm_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_mul_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_permute_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_repeat_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_scatter_add_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_split_list_args_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_squeeze_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_stack_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_sub_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_sum_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_unsqueeze_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_var_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_zero__cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_abs_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_acosh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_addmm_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_addmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_addmm_decomposed_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_all_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_angle_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asin_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asin_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asin_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asinh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atan_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atan_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atanh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atanh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_bmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cat_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_clone_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_conj_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_conj_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_constant_pad_nd_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cos_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cosh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cosh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cumprod_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_diagonal_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_diagonal_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_eq_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_exp_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_expand_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_expm1_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_expm1_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_flip_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_full_like_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_gather_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_imag_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_index_select_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_isclose_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_isinf_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_isnan_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log1p_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log1p_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_logical_not_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_logical_xor_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_masked_fill_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_masked_scatter_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_masked_scatter_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_mean_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_mul_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_ne_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_ne_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_neg_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_neg_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_new_zeros_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_permute_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_pow_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_prod_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_prod_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_reciprocal_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_repeat_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_rsqrt_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sin_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sinh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_slice_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_split_list_args_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sqrt_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sqrt_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_squeeze_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_stack_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_stack_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sub_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_t_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_tanh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_transpose_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_var_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_var_unbiased_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_view_as_real_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_view_as_real_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_view_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_where_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_zeros_like_cuda_complex32 2025-12-04T14:11:18.7852338Z 2025-12-04T14:11:18.7852713Z Finished complex_tensor/test_complex_tensor 3/3 ... [2025-12-04 14:11:18.772805][19118.701102271], took 3.26min 2025-12-04T14:11:18.8063012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/complex_tensor.test_complex_tensor/complex_tensor.test_complex_tensor-b8215f419723e2db.xml 2025-12-04T14:11:18.8913427Z Running optim/test_optim 1/1 ... [2025-12-04 14:11:18.891097][19118.819396713] 2025-12-04T14:11:18.8913913Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:11:18.8916863Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:11:18.891408] 2025-12-04T14:11:21.5614781Z 2025-12-04T14:11:21.5615911Z optim/test_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_optim_1.1_96914e6399c32275_.log 2025-12-04T14:11:21.5616625Z 2025-12-04T14:11:21.5616921Z Finished optim/test_optim 1/1 ... [2025-12-04 14:11:21.561232][19121.489528287], took 0.04min 2025-12-04T14:11:21.5946193Z Running torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 14:11:21.594381][19121.522680798] 2025-12-04T14:11:21.5946955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:11:21.5949588Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/fft/test_pocketfft.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:11:21.594692] 2025-12-04T14:11:27.0181565Z 2025-12-04T14:11:27.0182688Z torch_np/numpy_tests/fft/test_pocketfft 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_de16653e9fff9a91_.log 2025-12-04T14:11:27.0205071Z Running 79 items in this shard: test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTShift::test_fft_n, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_all_1d_norm_preserving, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_hfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_identity, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_backward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_forward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_ortho, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ihfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_ifft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_rfft 2025-12-04T14:11:27.0225646Z 2025-12-04T14:11:27.0226030Z Finished torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 14:11:27.018047][19126.946343927], took 0.09min 2025-12-04T14:11:27.0513421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bedc5291ea06d7d4.xml 2025-12-04T14:11:27.1440662Z Running functorch/test_ops 1/9 ... [2025-12-04 14:11:27.143788][19127.072087697] 2025-12-04T14:11:27.1441536Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:11:27.1443475Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=1', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:11:27.144087] 2025-12-04T14:17:38.0423424Z 2025-12-04T14:17:38.0424368Z functorch/test_ops 1/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_1.9_c0eee613fccab590_.log 2025-12-04T14:17:38.0726305Z Running 1130 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_nll_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_softmax_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_topk_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_T_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_conj_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_flatten_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___radd___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_lengths_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gradient_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hash_tensor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hypot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_householder_product_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_layer_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_area_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bicubic_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_replicate_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rad2deg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zero__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32 2025-12-04T14:17:38.1018801Z 2025-12-04T14:17:38.1019032Z Finished functorch/test_ops 1/9 ... [2025-12-04 14:17:38.043531][19497.971824397], took 6.18min 2025-12-04T14:17:38.1019769Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-7c0d16c38d6c5d66.xml 2025-12-04T14:17:38.1967942Z Running functorch/test_ops 6/9 ... [2025-12-04 14:17:38.196551][19498.124848884] 2025-12-04T14:17:38.1968363Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:17:38.1970661Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=6', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:17:38.196836] 2025-12-04T14:26:24.8016629Z 2025-12-04T14:26:24.8020071Z functorch/test_ops 6/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_6.9_28c20a9b783bf56f_.log 2025-12-04T14:26:24.8310586Z Running 1090 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ceil_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_sort_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_flatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mT_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_reshape_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpySortAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_block_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ceil_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumsum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_deg2rad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_normalize_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_minimum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gelu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_layer_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_outer_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_blackman_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_consecutive_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32 2025-12-04T14:26:24.8592178Z 2025-12-04T14:26:24.8592422Z Finished functorch/test_ops 6/9 ... [2025-12-04 14:26:24.803228][20024.731520754], took 8.78min 2025-12-04T14:26:24.8593151Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-53045e36c53c5e0b.xml 2025-12-04T14:26:24.9525872Z Running torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 14:26:24.952347][20024.880644606] 2025-12-04T14:26:24.9526391Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:26:24.9529267Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_getlimits.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:24.952659] 2025-12-04T14:26:28.3233363Z 2025-12-04T14:26:28.3234565Z torch_np/numpy_tests/core/test_getlimits 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_82959dff790c271d_.log 2025-12-04T14:26:28.3240305Z Running 17 items in this shard: test/torch_np/numpy_tests/core/test_getlimits.py::TestPythonFloat::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestHalf::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestSingle::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestDouble::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic_missing, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T0, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T1, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T2, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T3, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_finfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_iinfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_instances, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_known_types, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_plausible_finfo, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_subnormal_warning 2025-12-04T14:26:28.3244758Z 2025-12-04T14:26:28.3245024Z Finished torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 14:26:28.322993][20028.251284904], took 0.06min 2025-12-04T14:26:28.3576945Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-3f363f609079497e.xml 2025-12-04T14:26:28.4328578Z Running torch_np/test_ndarray_methods 1/1 ... [2025-12-04 14:26:28.432554][20028.360849183] 2025-12-04T14:26:28.4329025Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:26:28.4332480Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ndarray_methods.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:28.432994] 2025-12-04T14:26:34.7580829Z 2025-12-04T14:26:34.7582392Z torch_np/test_ndarray_methods 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ndarray_methods_1.1_6320f1f29b59eecd_.log 2025-12-04T14:26:34.7674672Z Running 342 items in this shard: test/torch_np/test_ndarray_methods.py::TestIndexing::test_indexing_simple, test/torch_np/test_ndarray_methods.py::TestIndexing::test_setitem, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_function, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_method, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_function, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_method, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_function, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_array_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_onedim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_trivial, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_twodim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_sparse, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_max, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_min, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmax::test_maximum_signed_integers, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmin::test_minimum_signed_integers, test/torch_np/test_ndarray_methods.py::TestAmax::test_basic, test/torch_np/test_ndarray_methods.py::TestAmin::test_basic, test/torch_np/test_ndarray_methods.py::TestContains::test_contains, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_fn, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_ivar, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_method, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_name, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_plain, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_rvar, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_1d, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_2d 2025-12-04T14:26:34.7763773Z 2025-12-04T14:26:34.7764025Z Finished torch_np/test_ndarray_methods 1/1 ... [2025-12-04 14:26:34.758691][20034.686985782], took 0.11min 2025-12-04T14:26:34.7930900Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-84ccb0941d397f92.xml 2025-12-04T14:26:34.8791631Z Running test_view_ops 1/1 ... [2025-12-04 14:26:34.878934][20034.807231487] 2025-12-04T14:26:34.8792040Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:26:34.8794965Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_view_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:34.879242] 2025-12-04T14:26:49.7676355Z 2025-12-04T14:26:49.7677677Z test_view_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_view_ops_1.1_cb0863e2f971ce65_.log 2025-12-04T14:26:49.7734309Z Running 279 items in this shard: test/test_view_ops.py::TestViewOpsCUDA::test_T_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_assignment_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_gradients_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_ellipses_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_newaxis_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_slice_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_chunk_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_view_with_shared_memory_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_self_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_diagonal_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_movedim_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_narrow_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_permute_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_select_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_split_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unfold_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_complex_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex32, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_out_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_output_contiguous_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_T_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_as_strided_overflow_storage_offset_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_gradient_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_big_transpose_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_shapes_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_tensors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_chunk_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_conj_neg_view_numpy_error_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_contiguous_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_crow_col_indices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_empty_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_expand_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_flatten_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize__cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize_as_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_tensor_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_python_types_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_ravel_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_preserves_strides_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_overflow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_split_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_t_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_errors_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_unsqueeze_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_empty_cuda 2025-12-04T14:26:49.7788739Z 2025-12-04T14:26:49.7788928Z Finished test_view_ops 1/1 ... [2025-12-04 14:26:49.767851][20049.696144943], took 0.25min 2025-12-04T14:26:49.8019191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_view_ops/test_view_ops-7140a1ac93a67fd6.xml 2025-12-04T14:26:49.8805976Z Running test_nn 1/1 ... [2025-12-04 14:26:49.880357][20049.808654594] 2025-12-04T14:26:49.8806360Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:26:49.8809067Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nn.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:49.880638] 2025-12-04T14:32:36.6809209Z 2025-12-04T14:32:36.6809961Z test_nn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_nn_1.1_a9c5853bd400a590_.log 2025-12-04T14:32:36.7652080Z Running 2436 items in this shard: test/test_nn.py::TestNN::test_AdaptiveLogSoftmax, test/test_nn.py::TestNN::test_AdaptiveLogSoftmax_cuda_fp32, test/test_nn.py::TestNN::test_AdaptiveLogSoftmax_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_reduce, test/test_nn.py::TestNN::test_BCELoss_no_reduce_cuda, test/test_nn.py::TestNN::test_BCELoss_no_reduce_scalar, test/test_nn.py::TestNN::test_BCELoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_scalar, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_legacy_enum, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_legacy_enum_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_scalar, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_CELU_no_batch_dim, test/test_nn.py::TestNN::test_CELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_CTCLoss_critical_target_len, test/test_nn.py::TestNN::test_CTCLoss_lengthchecks_cpu, test/test_nn.py::TestNN::test_CTCLoss_lengthchecks_cuda, test/test_nn.py::TestNN::test_CTCLoss_long_targets, test/test_nn.py::TestNN::test_CTCLoss_typechecks, test/test_nn.py::TestNN::test_CTCLoss_zero_infinity, test/test_nn.py::TestNN::test_CTCLoss_zero_lengths, test/test_nn.py::TestNN::test_Conv1d, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_dilated, test/test_nn.py::TestNN::test_Conv1d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_groups, test/test_nn.py::TestNN::test_Conv1d_groups_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_groups_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad1, test/test_nn.py::TestNN::test_Conv1d_pad1_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad1_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad1size1, test/test_nn.py::TestNN::test_Conv1d_pad1size1_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad1size1_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad2, test/test_nn.py::TestNN::test_Conv1d_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad2size1, test/test_nn.py::TestNN::test_Conv1d_pad2size1_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad2size1_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_same, test/test_nn.py::TestNN::test_Conv1d_pad_same2, test/test_nn.py::TestNN::test_Conv1d_pad_same2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_same2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_same_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_same_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_valid, test/test_nn.py::TestNN::test_Conv1d_pad_valid_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_valid_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_stride, test/test_nn.py::TestNN::test_Conv1d_stride_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_stride_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_zero_batch, test/test_nn.py::TestNN::test_Conv1d_zero_batch_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_zero_batch_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise, test/test_nn.py::TestNN::test_Conv2d_depthwise_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_dilated, test/test_nn.py::TestNN::test_Conv2d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups, test/test_nn.py::TestNN::test_Conv2d_groups_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_no_bias, test/test_nn.py::TestNN::test_Conv2d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_pad_same, test/test_nn.py::TestNN::test_Conv2d_pad_same_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_pad_same_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_pad_valid, test/test_nn.py::TestNN::test_Conv2d_pad_valid_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_pad_valid_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_padding, test/test_nn.py::TestNN::test_Conv2d_padding_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_padding_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_strided, test/test_nn.py::TestNN::test_Conv2d_strided_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_strided_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_zero_batch, test/test_nn.py::TestNN::test_Conv2d_zero_batch_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_zero_batch_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_dilated, test/test_nn.py::TestNN::test_Conv3d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_dilated_strided, test/test_nn.py::TestNN::test_Conv3d_dilated_strided_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_dilated_strided_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_groups, test/test_nn.py::TestNN::test_Conv3d_groups_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_groups_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_no_bias, test/test_nn.py::TestNN::test_Conv3d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_pad_same, test/test_nn.py::TestNN::test_Conv3d_pad_same_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_pad_same_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_pad_valid, test/test_nn.py::TestNN::test_Conv3d_pad_valid_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_pad_valid_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride, test/test_nn.py::TestNN::test_Conv3d_stride_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride_padding, test/test_nn.py::TestNN::test_Conv3d_stride_padding_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_padding_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_zero_batch, test/test_nn.py::TestNN::test_Conv3d_zero_batch_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_zero_batch_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d, test/test_nn.py::TestNN::test_ConvTranspose1d_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d_groups, test/test_nn.py::TestNN::test_ConvTranspose1d_groups_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_groups_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d, test/test_nn.py::TestNN::test_ConvTranspose2d_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose3d, test/test_nn.py::TestNN::test_ConvTranspose3d_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose3d_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_CrossMapLRN2d, test/test_nn.py::TestNN::test_CrossMapLRN2d_cuda, test/test_nn.py::TestNN::test_ELU_no_batch_dim, test/test_nn.py::TestNN::test_ELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Embedding, test/test_nn.py::TestNN::test_EmbeddingBag_discontiguous, test/test_nn.py::TestNN::test_EmbeddingBag_discontiguous_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_max, test/test_nn.py::TestNN::test_EmbeddingBag_max_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_max_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_max_padding_idx_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_mean, test/test_nn.py::TestNN::test_EmbeddingBag_mean_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_mean_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_mean_padding_idx_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sparse, test/test_nn.py::TestNN::test_EmbeddingBag_sparse_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sum, test/test_nn.py::TestNN::test_EmbeddingBag_sum_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sum_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_sum_padding_idx_cuda, test/test_nn.py::TestNN::test_Embedding_cuda, test/test_nn.py::TestNN::test_Embedding_discontiguous, test/test_nn.py::TestNN::test_Embedding_discontiguous_cuda, test/test_nn.py::TestNN::test_Embedding_sparse, test/test_nn.py::TestNN::test_Embedding_sparse_cuda, test/test_nn.py::TestNN::test_Flatten, test/test_nn.py::TestNN::test_Flatten_cuda, test/test_nn.py::TestNN::test_Flatten_no_batch_dim, test/test_nn.py::TestNN::test_Flatten_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Fold, test/test_nn.py::TestNN::test_Fold_cuda, test/test_nn.py::TestNN::test_Fold_int_input, test/test_nn.py::TestNN::test_Fold_int_input_cuda, test/test_nn.py::TestNN::test_Fold_no_batch_dim_input, test/test_nn.py::TestNN::test_Fold_no_batch_dim_input_cuda, test/test_nn.py::TestNN::test_Fold_no_batch_dim_int_input, test/test_nn.py::TestNN::test_Fold_no_batch_dim_int_input_cuda, test/test_nn.py::TestNN::test_GELU_no_batch_dim, test/test_nn.py::TestNN::test_GELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_GLU_no_batch_dim, test/test_nn.py::TestNN::test_GLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardshrink_no_batch_dim, test/test_nn.py::TestNN::test_Hardshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardsigmoid_no_batch_dim, test/test_nn.py::TestNN::test_Hardsigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardswish_no_batch_dim, test/test_nn.py::TestNN::test_Hardswish_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardtanh_no_batch_dim, test/test_nn.py::TestNN::test_Hardtanh_no_batch_dim_cuda, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_margin_no_reduce, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_reduce, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_HuberLoss_delta, test/test_nn.py::TestNN::test_HuberLoss_delta_cuda, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_batch_mean, test/test_nn.py::TestNN::test_KLDivLoss_batch_mean_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_log_target_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_nn.py::TestNN::test_KLDivLoss_with_log_target_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_with_log_target_no_reduce_cuda, test/test_nn.py::TestNN::test_KLDivLoss_with_target_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_with_target_no_reduce_cuda, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_L1Loss_no_reduce, test/test_nn.py::TestNN::test_L1Loss_no_reduce_complex, test/test_nn.py::TestNN::test_L1Loss_no_reduce_complex_cuda, test/test_nn.py::TestNN::test_L1Loss_no_reduce_cuda, test/test_nn.py::TestNN::test_L1Loss_no_reduce_scalar, test/test_nn.py::TestNN::test_L1Loss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_LSTM_cell, test/test_nn.py::TestNN::test_LSTM_cell_forward_hidden_size, test/test_nn.py::TestNN::test_LSTM_cell_forward_input_size, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_cuda, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_eval, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_eval_cuda, test/test_nn.py::TestNN::test_LeakyReLU_no_batch_dim, test/test_nn.py::TestNN::test_LeakyReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Linear, test/test_nn.py::TestNN::test_Linear_cuda_fp32, test/test_nn.py::TestNN::test_Linear_cuda_tf32, test/test_nn.py::TestNN::test_Linear_no_batch_dim, test/test_nn.py::TestNN::test_Linear_no_batch_dim_cuda_fp32, test/test_nn.py::TestNN::test_Linear_no_batch_dim_cuda_tf32, test/test_nn.py::TestNN::test_Linear_no_bias, test/test_nn.py::TestNN::test_Linear_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Linear_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_LogSigmoid_no_batch_dim, test/test_nn.py::TestNN::test_LogSigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MSELoss_no_reduce, test/test_nn.py::TestNN::test_MSELoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MSELoss_no_reduce_scalar, test/test_nn.py::TestNN::test_MSELoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MaxUnpool1d_net, test/test_nn.py::TestNN::test_MaxUnpool1d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool1d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool1d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MaxUnpool2d_net, test/test_nn.py::TestNN::test_MaxUnpool2d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool2d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool2d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MaxUnpool3d_net, test/test_nn.py::TestNN::test_MaxUnpool3d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool3d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool3d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Mish_no_batch_dim, test/test_nn.py::TestNN::test_Mish_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ModuleDict, test/test_nn.py::TestNN::test_ModuleList, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_0d_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_1d_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_index_neg, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_index_neg_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_1d_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_1d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_margin_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_margin_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_p_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_p_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_weights_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_NLLLoss_no_reduce, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_nn.py::TestNN::test_PReLU_backward_requires_grad_false, test/test_nn.py::TestNN::test_PReLU_no_batch_dim, test/test_nn.py::TestNN::test_PReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_PairwiseDistance, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_lhs, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_lhs_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_rhs, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_rhs_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_no_batch_dim, test/test_nn.py::TestNN::test_PairwiseDistance_no_batch_dim_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_with_non_default_args, test/test_nn.py::TestNN::test_PairwiseDistance_with_non_default_args_cuda, test/test_nn.py::TestNN::test_ParameterDict, test/test_nn.py::TestNN::test_ParameterDict_replication, test/test_nn.py::TestNN::test_ParameterList, test/test_nn.py::TestNN::test_ParameterList_meta, test/test_nn.py::TestNN::test_ParameterList_replication, test/test_nn.py::TestNN::test_PixelShuffle, test/test_nn.py::TestNN::test_PixelShuffle_cuda, test/test_nn.py::TestNN::test_PixelUnshuffle, test/test_nn.py::TestNN::test_PixelUnshuffle_cuda, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_reduce, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_RNN_cell, test/test_nn.py::TestNN::test_RNN_cell_forward_zero_hidden_size, test/test_nn.py::TestNN::test_RNN_cell_no_broadcasting, test/test_nn.py::TestNN::test_RNN_change_dropout, test/test_nn.py::TestNN::test_RNN_cpu_vs_cudnn_no_dropout, test/test_nn.py::TestNN::test_RNN_cpu_vs_cudnn_with_dropout, test/test_nn.py::TestNN::test_RNN_cudnn_weight_norm, test/test_nn.py::TestNN::test_RNN_dropout, test/test_nn.py::TestNN::test_RNN_dropout_state, test/test_nn.py::TestNN::test_RNN_input_size_zero, test/test_nn.py::TestNN::test_RNN_nonlinearity, test/test_nn.py::TestNN::test_RNN_nonlinearity_passed_as_arg, test/test_nn.py::TestNN::test_RReLU, test/test_nn.py::TestNN::test_RReLU_cuda, test/test_nn.py::TestNN::test_RReLU_no_batch_dim, test/test_nn.py::TestNN::test_RReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_RReLU_with_up_down, test/test_nn.py::TestNN::test_RReLU_with_up_down_cuda, test/test_nn.py::TestNN::test_RReLU_with_up_down_scalar, test/test_nn.py::TestNN::test_RReLU_with_up_down_scalar_cuda, test/test_nn.py::TestNN::test_ReLU6_no_batch_dim, test/test_nn.py::TestNN::test_ReLU6_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ReLU_no_batch_dim, test/test_nn.py::TestNN::test_ReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d, test/test_nn.py::TestNN::test_ReplicationPad3d_complex, test/test_nn.py::TestNN::test_ReplicationPad3d_complex_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d_no_batch_dim, test/test_nn.py::TestNN::test_ReplicationPad3d_no_batch_dim_cuda, test/test_nn.py::TestNN::test_SELU_no_batch_dim, test/test_nn.py::TestNN::test_SELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Sequential_add, test/test_nn.py::TestNN::test_Sequential_append, test/test_nn.py::TestNN::test_Sequential_delitem, test/test_nn.py::TestNN::test_Sequential_extend, test/test_nn.py::TestNN::test_Sequential_getitem, test/test_nn.py::TestNN::test_Sequential_iadd, test/test_nn.py::TestNN::test_Sequential_imul, test/test_nn.py::TestNN::test_Sequential_insert, test/test_nn.py::TestNN::test_Sequential_insert_fail_case, test/test_nn.py::TestNN::test_Sequential_mul, test/test_nn.py::TestNN::test_Sequential_pop, test/test_nn.py::TestNN::test_Sequential_rmul, test/test_nn.py::TestNN::test_Sequential_setitem, test/test_nn.py::TestNN::test_Sequential_setitem_named, test/test_nn.py::TestNN::test_SiLU_no_batch_dim, test/test_nn.py::TestNN::test_SiLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Sigmoid_no_batch_dim, test/test_nn.py::TestNN::test_Sigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_beta, test/test_nn.py::TestNN::test_SmoothL1Loss_beta_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_scalar, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_zero_beta, test/test_nn.py::TestNN::test_SmoothL1Loss_zero_beta_cuda, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_reduce, test/test_nn.py::TestNN::test_SoftMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_Softplus_no_batch_dim, test/test_nn.py::TestNN::test_Softplus_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Softshrink_no_batch_dim, test/test_nn.py::TestNN::test_Softshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Softsign_no_batch_dim, test/test_nn.py::TestNN::test_Softsign_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Tanh_no_batch_dim, test/test_nn.py::TestNN::test_Tanh_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Tanhshrink_no_batch_dim, test/test_nn.py::TestNN::test_Tanhshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Threshold_no_batch_dim, test/test_nn.py::TestNN::test_Threshold_no_batch_dim_cuda, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation_cuda_tf32, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation_cuda_tf32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation_cuda_tf32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation_cuda_tf32, test/test_nn.py::TestNN::test_Transformer_cell, test/test_nn.py::TestNN::test_Transformer_multilayer_coder, test/test_nn.py::TestNN::test_Transformer_multilayer_coder_cuda_fp32, test/test_nn.py::TestNN::test_Transformer_multilayer_coder_cuda_tf32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_Unflatten_no_batch_dim, test/test_nn.py::TestNN::test_Unflatten_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Unfold, test/test_nn.py::TestNN::test_Unfold_cuda, test/test_nn.py::TestNN::test_Unfold_int_input, test/test_nn.py::TestNN::test_Unfold_int_input_cuda, test/test_nn.py::TestNN::test_adaptive_log_softmax, test/test_nn.py::TestNN::test_add_module, test/test_nn.py::TestNN::test_add_module_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_affine_grid, test/test_nn.py::TestNN::test_affine_grid_3d, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cpu_nd_2, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cpu_nd_3, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cuda_nd_2, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cuda_nd_3, test/test_nn.py::TestNN::test_affine_grid_error_checking, test/test_nn.py::TestNN::test_assignment, test/test_nn.py::TestNN::test_batch_norm_update_stats, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_buffer_update_when_stats_are_not_tracked, test/test_nn.py::TestNN::test_batchnorm_cudnn_half, test/test_nn.py::TestNN::test_batchnorm_cudnn_nhwc, test/test_nn.py::TestNN::test_batchnorm_half_overflow, test/test_nn.py::TestNN::test_batchnorm_load_state_dict, test/test_nn.py::TestNN::test_batchnorm_nhwc_cpu, test/test_nn.py::TestNN::test_batchnorm_nhwc_cuda, test/test_nn.py::TestNN::test_batchnorm_non_contig_cpu_BatchNorm2d, test/test_nn.py::TestNN::test_batchnorm_non_contig_cpu_SyncBatchNorm, test/test_nn.py::TestNN::test_batchnorm_nonaffine_cuda_half_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_bias_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_less_than_one_value_per_channel, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_mean_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_var_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_var_or_running_mean_have_forward_grad, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_weight_is_not_same_size_as_input, test/test_nn.py::TestNN::test_bce_loss_always_nonnegative, test/test_nn.py::TestNN::test_bce_loss_broadcasts_weights, test/test_nn.py::TestNN::test_bce_loss_input_range, test/test_nn.py::TestNN::test_bce_loss_size_mismatch, test/test_nn.py::TestNN::test_bce_with_logits_broadcasts_pos_weights, test/test_nn.py::TestNN::test_bce_with_logits_broadcasts_weights, test/test_nn.py::TestNN::test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss, test/test_nn.py::TestNN::test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss_large_tensors_with_grad, test/test_nn.py::TestNN::test_bce_with_logits_has_correct_forward_grad, test/test_nn.py::TestNN::test_bce_with_logits_has_correct_grad_at_zero, test/test_nn.py::TestNN::test_bce_with_logits_ones_in_pos_weights_are_the_same_as_none, test/test_nn.py::TestNN::test_bce_with_logits_raises_if_target_and_input_are_different_size, test/test_nn.py::TestNN::test_bce_with_logits_stability, test/test_nn.py::TestNN::test_bce_with_logits_with_pos_weight_has_correct_grad_at_zero, test/test_nn.py::TestNN::test_bilinear, test/test_nn.py::TestNN::test_bilinear_broadcasting, test/test_nn.py::TestNN::test_bilinear_no_bias, test/test_nn.py::TestNN::test_bilinear_non_contiguous, test/test_nn.py::TestNN::test_bilinear_value_error, test/test_nn.py::TestNN::test_broadcast_double_backwards_gpu, test/test_nn.py::TestNN::test_broadcast_no_grad, test/test_nn.py::TestNN::test_broadcast_not_requiring_grad, test/test_nn.py::TestNN::test_buffer_bad_module_subclass, test/test_nn.py::TestNN::test_buffer_not_persistent, test/test_nn.py::TestNN::test_buffer_not_persistent_assign, test/test_nn.py::TestNN::test_buffer_not_persistent_del, test/test_nn.py::TestNN::test_buffer_not_persistent_load, test/test_nn.py::TestNN::test_buffer_not_persistent_overwrite, test/test_nn.py::TestNN::test_buffers_and_named_buffers, test/test_nn.py::TestNN::test_call_supports_python_dict_output, test/test_nn.py::TestNN::test_channel_shuffle_input_checks, test/test_nn.py::TestNN::test_channel_shuffle_return_alias_of_self, test/test_nn.py::TestNN::test_children, test/test_nn.py::TestNN::test_container_copy, test/test_nn.py::TestNN::test_convert_sync_batchnorm, test/test_nn.py::TestNN::test_cosine_embedding_loss_error_on_diff_shapes, test/test_nn.py::TestNN::test_cosine_embedding_loss_error_on_nonexpandable_shapes, test/test_nn.py::TestNN::test_cosine_embedding_loss_invalid_shape, test/test_nn.py::TestNN::test_cosine_embedding_loss_margin_no_reduce, test/test_nn.py::TestNN::test_cosine_embedding_loss_no_reduce, test/test_nn.py::TestNN::test_cosine_embedding_loss_with_diff_type, test/test_nn.py::TestNN::test_cosine_similarity, test/test_nn.py::TestNN::test_cross_entropy_loss, test/test_nn.py::TestNN::test_cross_entropy_loss_precision, test/test_nn.py::TestNN::test_cross_entropy_loss_zero_div, test/test_nn.py::TestNN::test_cudnn_forward_exception, test/test_nn.py::TestNN::test_cudnn_rnn_dropout_states_device, test/test_nn.py::TestNN::test_cudnn_weight_format, test/test_nn.py::TestNN::test_cudnn_weight_tying, test/test_nn.py::TestNN::test_dir, test/test_nn.py::TestNN::test_dir_digit, test/test_nn.py::TestNN::test_elu_inplace_gradgrad, test/test_nn.py::TestNN::test_elu_inplace_on_view, test/test_nn.py::TestNN::test_error_RNN_seq_len_zero, test/test_nn.py::TestNN::test_extra_state, test/test_nn.py::TestNN::test_extra_state_missing_get_extra_state, test/test_nn.py::TestNN::test_extra_state_missing_set_extra_state, test/test_nn.py::TestNN::test_extra_state_non_dict, test/test_nn.py::TestNN::test_fb_fc_packed, test/test_nn.py::TestNN::test_flatten, test/test_nn.py::TestNN::test_fold_invalid_arg, test/test_nn.py::TestNN::test_fractional_max_pool2d_invalid_output_ratio, test/test_nn.py::TestNN::test_gaussian_nll_loss_args, test/test_nn.py::TestNN::test_gaussian_nll_loss_broadcasting, test/test_nn.py::TestNN::test_gaussian_nll_loss_scalar_var, test/test_nn.py::TestNN::test_get_buffer, test/test_nn.py::TestNN::test_get_buffer_from_submodules, test/test_nn.py::TestNN::test_getattr_with_property, test/test_nn.py::TestNN::test_grid_sample, test/test_nn.py::TestNN::test_grid_sample_3d, test/test_nn.py::TestNN::test_grid_sample_error_checking, test/test_nn.py::TestNN::test_grid_sample_nearest_neighbor_rounding_mode_consistency, test/test_nn.py::TestNN::test_hardtanh_backward, test/test_nn.py::TestNN::test_hardtanh_inplace_gradgrad, test/test_nn.py::TestNN::test_huber_loss_invalid_delta, test/test_nn.py::TestNN::test_inplace_thnn, test/test_nn.py::TestNN::test_interpolate, test/test_nn.py::TestNN::test_interpolate_bicubic_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_shared_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_shared_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_buffer_overflow, test/test_nn.py::TestNN::test_interpolate_illegal_memory_access, test/test_nn.py::TestNN::test_interpolate_linear_1d, test/test_nn.py::TestNN::test_interpolate_linear_1d_align_corners, test/test_nn.py::TestNN::test_interpolate_linear_1d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_linear_1d_cuda, test/test_nn.py::TestNN::test_interpolate_linear_1d_zero_dim, test/test_nn.py::TestNN::test_interpolate_linear_1d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_align_corners, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_cuda, test/test_nn.py::TestNN::test_interpolate_linear_tuple_1d, test/test_nn.py::TestNN::test_interpolate_linear_tuple_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_1d, test/test_nn.py::TestNN::test_interpolate_nearest_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_1d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_1d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d, test/test_nn.py::TestNN::test_interpolate_nearest_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d_launch_configs, test/test_nn.py::TestNN::test_interpolate_nearest_2d_launch_configs_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_3d, test/test_nn.py::TestNN::test_interpolate_nearest_3d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_3d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_3d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_1d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_2d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_3d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_3d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_1d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_2d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_3d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_zero_dim, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_align_corners, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_align_corners, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_cuda, test/test_nn.py::TestNN::test_interpolate_undefined_behavior_casting, test/test_nn.py::TestNN::test_kl_div_log_softmax_target, test/test_nn.py::TestNN::test_kl_div_with_diff_type, test/test_nn.py::TestNN::test_kl_div_with_diff_type_log_target, test/test_nn.py::TestNN::test_l1_loss_correct, test/test_nn.py::TestNN::test_large_max_pool2d_ch_last, test/test_nn.py::TestNN::test_layer_norm_backwards_eps, test/test_nn.py::TestNN::test_layer_norm_eps, test/test_nn.py::TestNN::test_layer_norm_grads_with_create_graph_flag, test/test_nn.py::TestNN::test_layer_norm_large_tensor, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightStrided, test/test_nn.py::TestNN::test_linear_broadcasting, test/test_nn.py::TestNN::test_linear_raise_on_scalar_input, test/test_nn.py::TestNN::test_log_softmax_dim0, test/test_nn.py::TestNN::test_log_softmax_dim0_cuda, test/test_nn.py::TestNN::test_log_softmax_dim3, test/test_nn.py::TestNN::test_log_softmax_dim3_cuda, test/test_nn.py::TestNN::test_log_softmax_lastdim, test/test_nn.py::TestNN::test_log_softmax_lastdim_cuda, test/test_nn.py::TestNN::test_log_softmax_scalar, test/test_nn.py::TestNN::test_log_softmax_scalar_cuda, test/test_nn.py::TestNN::test_log_softmax_spatial, test/test_nn.py::TestNN::test_log_softmax_spatial_cuda, test/test_nn.py::TestNN::test_log_softmax_spatial_special, test/test_nn.py::TestNN::test_log_softmax_spatial_special_cuda, test/test_nn.py::TestNN::test_loss_equal_input_target_shape, test/test_nn.py::TestNN::test_margin_ranking_loss_margin_no_reduce, test/test_nn.py::TestNN::test_margin_ranking_loss_no_reduce, test/test_nn.py::TestNN::test_max_pool1d_invalid_output_size, test/test_nn.py::TestNN::test_module_apply_inplace_op, test/test_nn.py::TestNN::test_module_backcompat, test/test_nn.py::TestNN::test_module_super_init, test/test_nn.py::TestNN::test_module_to_argparse, test/test_nn.py::TestNN::test_modules, test/test_nn.py::TestNN::test_mse_loss_size_warning, test/test_nn.py::TestNN::test_multimarginloss_1d_input_0d_target_no_reduce, test/test_nn.py::TestNN::test_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_nn.py::TestNN::test_named_children, test/test_nn.py::TestNN::test_named_modules, test/test_nn.py::TestNN::test_named_parameters_remove_duplicate, test/test_nn.py::TestNN::test_native_channel_shuffle_return_alias_of_self, test/test_nn.py::TestNN::test_nested_tensor_from_mask, test/test_nn.py::TestNN::test_nested_tensor_from_mask_error, test/test_nn.py::TestNN::test_no_grad, test/test_nn.py::TestNN::test_non_leaf_parameters, test/test_nn.py::TestNN::test_normalize, test/test_nn.py::TestNN::test_overwrite_module_params_on_conversion, test/test_nn.py::TestNN::test_pack_sequence_batch_sizes_throw, test/test_nn.py::TestNN::test_pad_scalar_error, test/test_nn.py::TestNN::test_padding_list, test/test_nn.py::TestNN::test_pairwise_distance, test/test_nn.py::TestNN::test_parameter_assignment, test/test_nn.py::TestNN::test_parameterlistdict_pickle, test/test_nn.py::TestNN::test_parameterlistdict_setting_attributes, test/test_nn.py::TestNN::test_parameters_and_named_parameters, test/test_nn.py::TestNN::test_parameters_to_vector, test/test_nn.py::TestNN::test_parse_to, test/test_nn.py::TestNN::test_partial_flat_weights, test/test_nn.py::TestNN::test_pdist, test/test_nn.py::TestNN::test_pdist_cpu_gradgrad_unimplemented, test/test_nn.py::TestNN::test_pdist_cuda_gradgrad_unimplemented, test/test_nn.py::TestNN::test_pdist_empty_col, test/test_nn.py::TestNN::test_pdist_empty_row, test/test_nn.py::TestNN::test_pdist_large, test/test_nn.py::TestNN::test_pdist_zeros, test/test_nn.py::TestNN::test_pickle_module_no_weights_only_warning, test/test_nn.py::TestNN::test_pixel_shuffle_nhwc_cpu, test/test_nn.py::TestNN::test_pixel_shuffle_unshuffle, test/test_nn.py::TestNN::test_pointwise_loss_broadcast, test/test_nn.py::TestNN::test_pointwise_loss_target_grad_none_reduction, test/test_nn.py::TestNN::test_projections_errors_on_gru_and_rnn, test/test_nn.py::TestNN::test_projections_lstm_args_check, test/test_nn.py::TestNN::test_projections_lstm_check_device, test/test_nn.py::TestNN::test_projections_lstm_initial_hidden_state, test/test_nn.py::TestNN::test_register_buffer_allows_overwriting_with_same_name, test/test_nn.py::TestNN::test_register_buffer_allows_tensor_like_object, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_name_is_not_string, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_not_tensor, test/test_nn.py::TestNN::test_register_parameter_allows_overwriting_with_same_name, test/test_nn.py::TestNN::test_register_parameter_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_register_parameter_raises_error_if_name_is_not_string, test/test_nn.py::TestNN::test_relu_inplace_on_view, test/test_nn.py::TestNN::test_repr, test/test_nn.py::TestNN::test_requires_grad_, test/test_nn.py::TestNN::test_rnn_args_check, test/test_nn.py::TestNN::test_rnn_check_device, test/test_nn.py::TestNN::test_rnn_initial_hidden_state, test/test_nn.py::TestNN::test_rnn_weight_norm, test/test_nn.py::TestNN::test_set_submodule, test/test_nn.py::TestNN::test_share_memory, test/test_nn.py::TestNN::test_smoothl1loss_intergral_target, test/test_nn.py::TestNN::test_smoothl1loss_negative_beta_not_supported, test/test_nn.py::TestNN::test_softmax_functional_dim0, test/test_nn.py::TestNN::test_softmax_functional_dim0_cuda, test/test_nn.py::TestNN::test_softmax_functional_dim3, test/test_nn.py::TestNN::test_softmax_functional_dim3_cuda, test/test_nn.py::TestNN::test_softmax_functional_scalar, test/test_nn.py::TestNN::test_softmax_functional_scalar_cuda, test/test_nn.py::TestNN::test_softmax_lastdim, test/test_nn.py::TestNN::test_softmax_lastdim_cuda, test/test_nn.py::TestNN::test_softmax_lastdim_dtype, test/test_nn.py::TestNN::test_softmax_lastdim_dtype_cuda, test/test_nn.py::TestNN::test_softmax_spatial, test/test_nn.py::TestNN::test_softmax_spatial_cuda, test/test_nn.py::TestNN::test_softmax_spatial_dtype, test/test_nn.py::TestNN::test_softmax_spatial_dtype_cuda, test/test_nn.py::TestNN::test_softmax_spatial_special, test/test_nn.py::TestNN::test_softmax_spatial_special_cuda, test/test_nn.py::TestNN::test_softmin, test/test_nn.py::TestNN::test_spectral_norm, test/test_nn.py::TestNN::test_spectral_norm_dim, test/test_nn.py::TestNN::test_spectral_norm_forward, test/test_nn.py::TestNN::test_spectral_norm_load_state_dict, test/test_nn.py::TestNN::test_spectral_norm_pickle, test/test_nn.py::TestNN::test_state_dict, test/test_nn.py::TestNN::test_swap_module_params_poisons_acc_grad, test/test_nn.py::TestNN::test_sync_batchnorm_accuracy_cuda, test/test_nn.py::TestNN::test_sync_batchnorm_backward_elemt, test/test_nn.py::TestNN::test_threshold_bfloat16_half, test/test_nn.py::TestNN::test_threshold_int, test/test_nn.py::TestNN::test_to, test/test_nn.py::TestNN::test_train_errors_for_invalid_mode, test/test_nn.py::TestNN::test_transformer_args_check, test/test_nn.py::TestNN::test_transformer_layer_args_check, test/test_nn.py::TestNN::test_transformerdecoder, test/test_nn.py::TestNN::test_transformerdecoderlayer, test/test_nn.py::TestNN::test_transformerdecoderlayer_gelu, test/test_nn.py::TestNN::test_triplet_margin_loss, test/test_nn.py::TestNN::test_triplet_margin_loss_no_reduce, test/test_nn.py::TestNN::test_triplet_margin_loss_swap, test/test_nn.py::TestNN::test_triplet_margin_loss_swap_no_reduce, test/test_nn.py::TestNN::test_type, test/test_nn.py::TestNN::test_unflatten, test/test_nn.py::TestNN::test_unflatten_invalid_arg, test/test_nn.py::TestNN::test_unfold_invalid_arg, test/test_nn.py::TestNN::test_upsamplingBilinear2d_spatial_invariance, test/test_nn.py::TestNN::test_upsamplingLinear1d, test/test_nn.py::TestNN::test_upsamplingLinear1d_spatial_invariance, test/test_nn.py::TestNN::test_upsamplingTrilinear3d_spatial_invariance, test/test_nn.py::TestNN::test_upsampling_bfloat16, test/test_nn.py::TestNN::test_upsampling_not_recompute_scale_factor, test/test_nn.py::TestNN::test_upsampling_small_scale, test/test_nn.py::TestNN::test_vector_to_parameters, test/test_nn.py::TestNN::test_weight_norm, test/test_nn.py::TestNN::test_weight_norm_pickle, test/test_nn.py::TestNN::test_weighted_huber_loss, test/test_nn.py::TestNN::test_weighted_l1_loss_with_weights, test/test_nn.py::TestNN::test_weighted_mse_loss, test/test_nn.py::TestNN::test_zero_grad, test/test_nn.py::TestFusionEval::test_fuse_module_eval_numerics, test/test_nn.py::TestConstantPadNd::test_constant_pad_nd, test/test_nn.py::TestConstantPadNd::test_preserves_memory_format, test/test_nn.py::TestAddRelu::test_add_relu, test/test_nn.py::TestAddRelu::test_add_relu_broadcasting, test/test_nn.py::TestFunctionalPickle::test_pickle_softsign, test/test_nn.py::TestFusionUtils::test_fuse_conv_bn_requires_grad, test/test_nn.py::TestFusionUtils::test_fuse_linear_bn_requires_grad, test/test_nn.py::TestUtils::test_consume_prefix_in_state_dict_if_present, test/test_nn.py::TestNNDeviceTypeCUDA::test_BatchNorm_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_Bilinear_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_cudnn_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_empty_target_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_mean_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_mean_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_none_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_none_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_sum_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_sum_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GRU_grad_and_gradgrad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_memory_format_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_numeric_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_raises_error_if_one_value_per_group_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm1d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm2d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm3d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_differentiable_backward_using_oneDNN_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_differentiable_backward_using_oneDNN_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_grad_and_gradgrad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_LayerNorm_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LayerNorm_numeric_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LocalResponseNorm_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_race_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_race_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_warnings_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad2d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad2d_large_deterministic_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad3d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_empty_cuda_complex64, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_fails_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad1d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad2d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad3d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad_empty_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerDecoderLayer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerDecoder_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerEncoderLayer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerEncoder_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_Transformer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_Unfold_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_half_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_half_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_adaptiveavg_pool1d_shmem_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate45_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate90_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotateRandom_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_3d_rotateRandom_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_avg_pool_large_tensor2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_avg_pool_large_tensor_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_large_batch_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_large_batch_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_update_stats_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_channel_shuffle_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_error_if_nonfinite_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_0_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_1_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_4_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_inf_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_0_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_1_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_4_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_inf_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_multi_device_foreach_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_multi_device_foreach_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_value_foreach_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_value_foreach_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_consistent_index_target_and_probs_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_errors_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_weight_ignore_indices_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_with_probs_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_index_target_unit_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_one_hot_target_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_all_reductions_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_mean_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_mean_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_none_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_none_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_sum_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_sum_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_unit_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_tensor_cpu_length_cuda_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_tensor_cuda_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_error_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cudnn_rnn_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_cudnn_rnn_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_device_mask_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_elu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_elu_inplace_with_neg_alpha_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_fold_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_glu_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_bfloat16_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_half_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_2d_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_2d_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_3d_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_3d_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_nan_inf_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_nan_inf_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardsigmoid_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_for_single_spatial_element_during_training_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_less_than_one_value_per_channel_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_invalid_reduction_strings_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_large_max_pool2d_ch_last_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_large_max_pool_contig_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_large_reflect_pad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_layernorm_half_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_layernorm_weight_bias_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_with_neg_slope_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_with_zero_slope_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_linear_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_big_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_big_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_logsigmoid_out_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_lstmcell_backward_only_one_output_grad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_TxT_layout_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_devices_parity_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_forward_with_nans_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_lowp_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_lowp_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_mask_types_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_transformer_layout_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_mish_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_non_recursive_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_mse_loss_error_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_1d_input_1d_target_invalid_size_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_all_ignored_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_byte_target_matches_long_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_mean_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_none_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_sum_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_invalid_target_dim_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_invalid_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_mismatched_batch_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_out_of_bounds_ignore_index_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_total_weight_is_zero_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_scalars_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_scalars_reductions_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nonlinearity_propagate_nan_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_one_hot_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_overwrite_module_params_on_conversion_cpu_device_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_pad_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_pad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_prelu_backward_32bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_replicatepad_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_numeric_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_numeric_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_fused_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_fused_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rrelu_bounds_validation_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_save_lstm_compatibility_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_silu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_skip_init_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smooth_l1_loss_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smooth_l1_loss_vs_huber_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smoothl1loss_backward_zero_beta_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_64bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_smem_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_unaligned_grad_output_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_unaligned_output_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_without_fully_vectorized_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_bfloat16_half_to_float_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_double_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_forward_64bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_results_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_results_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_softplus_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softplus_low_threshold_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_negative_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_threshold_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_complex64, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_fast_path_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_gelu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_gelu_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_triplet_margin_with_distance_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_triplet_margin_with_distance_loss_default_parity_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_False_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_False_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_True_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_True_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_False_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_False_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_True_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_True_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_aa_correctness_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_aa_correctness_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_correctness_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBilinear2d_aa_correctness_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBilinear2d_aa_correctness_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_correctness_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_correctness_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_fail_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_rocm_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format0_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format0_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format1_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format1_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format0_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format0_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format1_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format1_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_correctness_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_correctness_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_rescale_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_False_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_False_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_True_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_True_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsampling_64bit_indexing_channels_last_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsampling_64bit_indexing_channels_last_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingnearest2d_backward_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_warp_softmax_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_warp_softmax_64bit_indexing_cuda_float32 2025-12-04T14:32:36.8459252Z 2025-12-04T14:32:36.8459444Z Finished test_nn 1/1 ... [2025-12-04 14:32:36.684475][20396.612766287], took 5.78min 2025-12-04T14:32:36.8460058Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nn/test_nn-9b13ea9411b68db1.xml 2025-12-04T14:32:38.4145883Z Uploading artifacts took 1.50 seconds 2025-12-04T14:32:38.4149131Z Running torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 14:32:38.414671][20398.342967493] 2025-12-04T14:32:38.4149656Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:32:38.4153089Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_index_tricks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:32:38.415040] 2025-12-04T14:32:41.9360988Z 2025-12-04T14:32:41.9362447Z torch_np/numpy_tests/lib/test_index_tricks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a3265ffbea1c2963_.log 2025-12-04T14:32:41.9375486Z Running 47 items in this shard: test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_big_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_clipmodes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_dtypes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_clip, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_raise, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_unravel, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_writeability, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_longdouble, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npcomplexfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_linspace_equivalence, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start0_stop_10_step0_expected0, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start_-10_stop_20_step1_expected1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_nd, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_sparse, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_1d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_2d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_complex_step, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_more_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdenumerate::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_simple_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_1d_only, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_bool, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_repeated_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_shape_and_dtype, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestC::test_c_, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_hetero_shape_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_low_dim_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_operate_4d_array, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_wide_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndices::test_diag_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_diag_indices_from, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_shape_mismatch, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_small_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdIndex::test_ndindex 2025-12-04T14:32:41.9387105Z 2025-12-04T14:32:41.9387382Z Finished torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 14:32:41.935902][20401.864186735], took 0.06min 2025-12-04T14:32:41.9707285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-6bd6911b2987aa11.xml 2025-12-04T14:32:42.0032822Z Running test_jit_autocast 1/1 ... [2025-12-04 14:32:42.003037][20401.931335564] 2025-12-04T14:32:42.0033515Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:32:42.0036271Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:32:42.003335] 2025-12-04T14:33:04.0700866Z 2025-12-04T14:33:04.0701669Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_125de5112eb33739_.log 2025-12-04T14:33:04.0712607Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing 2025-12-04T14:33:04.0723123Z 2025-12-04T14:33:04.0723325Z Finished test_jit_autocast 1/1 ... [2025-12-04 14:33:04.069940][20423.998237005], took 0.37min 2025-12-04T14:33:04.1046281Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-a884db72e4413a95.xml 2025-12-04T14:33:04.1831916Z Running nn/test_pooling 1/1 ... [2025-12-04 14:33:04.182930][20424.111229562] 2025-12-04T14:33:04.1832366Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:33:04.1835179Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pooling.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:04.183242] 2025-12-04T14:33:14.0156003Z 2025-12-04T14:33:14.0156798Z nn/test_pooling 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pooling_1.1_2fc4f4b87243805c_.log 2025-12-04T14:33:14.0196681Z Running 147 items in this shard: test/nn/test_pooling.py::TestAvgPool::test_avg_pool1d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool2d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool3d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d_with_divisor, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d_with_divisor, test/nn/test_pooling.py::TestPoolingNN::test_MaxUnpool2d_output_size, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_nhwc_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_backward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_forward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_non_contiguous, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_lower_precision, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_none, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_overflow, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool2d_nhwc_cpu, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool3d_input_check, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool1d_empty_kernel, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool3d, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool2d_empty_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool3d_backward_after_cat_dim1_device_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_LPPool1d_kernel_size_overflow_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case10_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case4_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case5_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case6_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case7_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case8_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case9_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_invalid_output_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool2d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool3d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_max_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pool_odd_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_uint8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool3d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_with_indices_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_unpool_invalid_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool3d_non_square_backward_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_large_size_int64_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_size_one_feature_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_bfloat16_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_zero_stride_cuda 2025-12-04T14:33:14.0234832Z 2025-12-04T14:33:14.0235030Z Finished nn/test_pooling 1/1 ... [2025-12-04 14:33:14.015642][20433.943940075], took 0.16min 2025-12-04T14:33:14.0503950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-21d4709389e5b8ee.xml 2025-12-04T14:33:14.1551794Z Running nn/test_embedding 1/1 ... [2025-12-04 14:33:14.154929][20434.083227628] 2025-12-04T14:33:14.1552212Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:33:14.1554572Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_embedding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:14.155216] 2025-12-04T14:33:26.6899079Z 2025-12-04T14:33:26.6899978Z nn/test_embedding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_embedding_1.1_0d2f816320a0d140_.log 2025-12-04T14:33:26.6953497Z Running 156 items in this shard: test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_padding_idx_error, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int16, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_uint8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm_unsorted_repeating_indices, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_basic, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_empty_tensor, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_2d_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_large_tensors, test/nn/test_embedding.py::TestEmbeddingNN::test_move_sparse_half_embedding, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_large_batch_overflow_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_dimension_errors_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_dense_grad_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_scalar_weight_error_cuda 2025-12-04T14:33:26.7004482Z 2025-12-04T14:33:26.7004689Z Finished nn/test_embedding 1/1 ... [2025-12-04 14:33:26.689980][20446.61827472], took 0.21min 2025-12-04T14:33:26.7250628Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-ee3bcbb253c53d76.xml 2025-12-04T14:33:26.8136493Z Running test_xnnpack_integration 1/1 ... [2025-12-04 14:33:26.813409][20446.741708008] 2025-12-04T14:33:26.8136936Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:33:26.8139794Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:26.813719] 2025-12-04T14:33:37.1340874Z 2025-12-04T14:33:37.1341744Z test_xnnpack_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_1.1_cee22bfc3e04069b_.log 2025-12-04T14:33:37.1345621Z Running 12 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear_1d_input, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_combined_model, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_decomposed_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_basic, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_with_relu_fc 2025-12-04T14:33:37.1348892Z 2025-12-04T14:33:37.1349172Z Finished test_xnnpack_integration 1/1 ... [2025-12-04 14:33:37.133695][20457.061988547], took 0.17min 2025-12-04T14:33:37.1683612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-f847b013a309e4b4.xml 2025-12-04T14:33:37.2587629Z Running test_cuda_trace 1/1 ... [2025-12-04 14:33:37.258505][20457.186804792] 2025-12-04T14:33:37.2588048Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:33:37.2591028Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_trace.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:37.258825] 2025-12-04T14:34:21.1299881Z 2025-12-04T14:34:21.1300648Z test_cuda_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_trace_1.1_a09bfc4c91236df9_.log 2025-12-04T14:34:21.1303814Z Running 12 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called, test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback, test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization, test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback 2025-12-04T14:34:21.1306483Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called 2025-12-04T14:34:21.1307025Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback 2025-12-04T14:34:21.1307837Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback 2025-12-04T14:34:21.1308366Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback 2025-12-04T14:34:21.1308876Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback 2025-12-04T14:34:21.1309402Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback 2025-12-04T14:34:21.1310045Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback 2025-12-04T14:34:21.1310566Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization 2025-12-04T14:34:21.1311086Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback 2025-12-04T14:34:21.1311639Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback 2025-12-04T14:34:21.1312162Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback 2025-12-04T14:34:21.1312681Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback 2025-12-04T14:34:21.1312999Z 2025-12-04T14:34:21.1313181Z Finished test_cuda_trace 1/1 ... [2025-12-04 14:34:21.130001][20501.058296639], took 0.73min 2025-12-04T14:34:21.1654615Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d415fb392a79d6b5.xml 2025-12-04T14:34:21.2481933Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-020323916fe208b4.xml 2025-12-04T14:34:21.2781412Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-ec2ab5853a8f5bb5.xml 2025-12-04T14:34:21.3049778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2344770aa1066fe.xml 2025-12-04T14:34:21.3324132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7974fe8ff8b4bbf1.xml 2025-12-04T14:34:21.3609026Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a89ca46644423e33.xml 2025-12-04T14:34:21.3908728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a24d5339b3f8a425.xml 2025-12-04T14:34:21.4181212Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-e1f683ecde7e9859.xml 2025-12-04T14:34:21.4586455Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-18c9d7ef1c10ca2f.xml 2025-12-04T14:34:21.4897647Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c34005545b418b69.xml 2025-12-04T14:34:21.5146544Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-fef6b14731bb5c3b.xml 2025-12-04T14:34:21.5464047Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-9f269287574f2f5d.xml 2025-12-04T14:34:21.5748947Z Running test_native_mha 1/1 ... [2025-12-04 14:34:21.574689][20501.502988246] 2025-12-04T14:34:21.5749377Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:34:21.5752289Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_native_mha.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:21.574982] 2025-12-04T14:34:25.7962187Z 2025-12-04T14:34:25.7963266Z test_native_mha 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_native_mha_1.1_0baa40d098e975c1_.log 2025-12-04T14:34:25.7992622Z Running 54 items in this shard: test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_attention_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_attention_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_encoder_decoder_attention_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_encoder_decoder_attention_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_transform_bias_rescale_qkv_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_transform_bias_rescale_qkv_nested_cuda_float32 2025-12-04T14:34:25.8019794Z 2025-12-04T14:34:25.8019981Z Finished test_native_mha 1/1 ... [2025-12-04 14:34:25.796051][20505.72434675], took 0.07min 2025-12-04T14:34:25.8325473Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_native_mha/test_native_mha-92c62998ce15c273.xml 2025-12-04T14:34:25.8648914Z Running torch_np/numpy_tests/core/test_numerictypes 1/1 ... [2025-12-04 14:34:25.864651][20505.792949057] 2025-12-04T14:34:25.8649732Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:34:25.8652192Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_numerictypes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:25.864943] 2025-12-04T14:34:29.2353113Z 2025-12-04T14:34:29.2354789Z torch_np/numpy_tests/core/test_numerictypes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_numerictypes_1.1_2b678d8fcdaea099_.log 2025-12-04T14:34:29.2367562Z Running 34 items in this shard: test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_loses1, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_loses2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_wins, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_wins2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_wins3, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_both_abstract, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_nondtype_nonscalartype, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_same, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_sibling_class, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_subclass, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_subclass_backwards, test/torch_np/numpy_tests/core/test_numerictypes.py::TestBitName::test_abstract, test/torch_np/numpy_tests/core/test_numerictypes.py::TestDocStrings::test_platform_dependent_aliases, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t0, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t1, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t3, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t4, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t5, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t6, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t7, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t8, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t9, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_unique, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t0, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t1, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t3, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t4, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t5, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t6, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t7, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t8, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t9 2025-12-04T14:34:29.2377553Z 2025-12-04T14:34:29.2377842Z Finished torch_np/numpy_tests/core/test_numerictypes 1/1 ... [2025-12-04 14:34:29.234992][20509.163287227], took 0.06min 2025-12-04T14:34:29.2719639Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numerictypes/torch_np.numpy_tests.core.test_numerictypes-794afd975a25dd08.xml 2025-12-04T14:34:29.3027120Z Running test_cuda_nvml_based_avail 1/1 ... [2025-12-04 14:34:29.302486][20509.230785716] 2025-12-04T14:34:29.3027554Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:34:29.3030469Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_nvml_based_avail.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:29.302790] 2025-12-04T14:34:58.9806270Z 2025-12-04T14:34:58.9807292Z test_cuda_nvml_based_avail 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_nvml_based_avail_1.1_f1988982a9fd6374_.log 2025-12-04T14:34:58.9811847Z Running 9 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init2, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_0, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_1, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init2, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_0, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_1, test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_env_var_parsing, test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_ordinal_parse_visible_devices, test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_partial_uuid_resolver 2025-12-04T14:34:58.9814922Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init2 2025-12-04T14:34:58.9816001Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_0 2025-12-04T14:34:58.9817470Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_1 2025-12-04T14:34:58.9818372Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init2 2025-12-04T14:34:58.9819128Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_0 2025-12-04T14:34:58.9819898Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_1 2025-12-04T14:34:58.9820565Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_env_var_parsing 2025-12-04T14:34:58.9821183Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_ordinal_parse_visible_devices 2025-12-04T14:34:58.9821824Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_partial_uuid_resolver 2025-12-04T14:34:58.9822183Z 2025-12-04T14:34:58.9822390Z Finished test_cuda_nvml_based_avail 1/1 ... [2025-12-04 14:34:58.980513][20538.908808132], took 0.49min 2025-12-04T14:34:59.0178655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-264e490a42609567.xml 2025-12-04T14:34:59.1032939Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3424f861fbb3f30b.xml 2025-12-04T14:34:59.1376259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-34f31a11d054a5e2.xml 2025-12-04T14:34:59.1694336Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-5cc2b0915fc09b1e.xml 2025-12-04T14:34:59.2259404Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8b636380fe175cf1.xml 2025-12-04T14:34:59.2577247Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3e5b65c914a199ce.xml 2025-12-04T14:34:59.2999308Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-df0d139575237859.xml 2025-12-04T14:34:59.3426213Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3264b6f40acb9dc5.xml 2025-12-04T14:34:59.3738636Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8f705997ba04d72c.xml 2025-12-04T14:34:59.4339855Z Running test_function_schema 1/1 ... [2025-12-04 14:34:59.433744][20539.362041768] 2025-12-04T14:34:59.4340291Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:34:59.4343071Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_function_schema.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:59.434032] 2025-12-04T14:35:02.9546880Z 2025-12-04T14:35:02.9548547Z test_function_schema 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_function_schema_1.1_77625ad433e579dc_.log 2025-12-04T14:35:02.9554056Z Running 15 items in this shard: test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_arguments, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_outputs, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_structure, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_with_smart_serialization, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_real_use_case, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_with_out, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_without_out, test/test_function_schema.py::TestFunctionSchema::test_hash_schema, test/test_function_schema.py::TestFunctionSchema::test_out_schema, test/test_function_schema.py::TestFunctionSchema::test_schema_error, test/test_function_schema.py::TestFunctionSchema::test_serialize_and_deserialize, test/test_function_schema.py::TestFunctionSchema::test_string_optional_parameter_default_value, test/test_function_schema.py::TestFunctionSchema::test_sym_int_argument_properly_parsed, test/test_function_schema.py::TestFunctionSchema::test_tensor_list_alias_annotation_properly_parsed, test/test_function_schema.py::TestFunctionSchema::test_tensor_option_arguments_properly_parsed 2025-12-04T14:35:02.9558117Z 2025-12-04T14:35:02.9558334Z Finished test_function_schema 1/1 ... [2025-12-04 14:35:02.954338][20542.882626612], took 0.06min 2025-12-04T14:35:02.9923486Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_function_schema/test_function_schema-23d7f60e4862c430.xml 2025-12-04T14:35:03.0264645Z Running test_accelerator 1/1 ... [2025-12-04 14:35:03.026224][20542.954524021] 2025-12-04T14:35:03.0265345Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:03.0267855Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_accelerator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:03.026513] 2025-12-04T14:35:06.7976756Z 2025-12-04T14:35:06.7977712Z test_accelerator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_accelerator_1.1_df51eee5e1970e59_.log 2025-12-04T14:35:06.7981622Z Running 12 items in this shard: test/test_accelerator.py::TestAccelerator::test_current_accelerator, test/test_accelerator.py::TestAccelerator::test_current_stream_query, test/test_accelerator.py::TestAccelerator::test_device_context_manager, test/test_accelerator.py::TestAccelerator::test_generic_event_behavior, test/test_accelerator.py::TestAccelerator::test_generic_multi_device_behavior, test/test_accelerator.py::TestAccelerator::test_generic_stream_behavior, test/test_accelerator.py::TestAccelerator::test_get_memory_info, test/test_accelerator.py::TestAccelerator::test_memory_stats, test/test_accelerator.py::TestAccelerator::test_multi_device_context_manager, test/test_accelerator.py::TestAccelerator::test_multi_device_stream_context_manager, test/test_accelerator.py::TestAccelerator::test_pin_memory_on_non_blocking_copy, test/test_accelerator.py::TestAccelerator::test_stream_context_manager 2025-12-04T14:35:06.7984713Z 2025-12-04T14:35:06.7984921Z Finished test_accelerator 1/1 ... [2025-12-04 14:35:06.797373][20546.725663795], took 0.06min 2025-12-04T14:35:06.8349910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_accelerator/test_accelerator-96bd402fc51988fa.xml 2025-12-04T14:35:06.8701594Z Running nn/test_init 1/1 ... [2025-12-04 14:35:06.869916][20546.798215867] 2025-12-04T14:35:06.8702008Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:06.8704828Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:06.870206] 2025-12-04T14:35:13.2956163Z 2025-12-04T14:35:13.2957639Z nn/test_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_init_1.1_83fc7b147f0c612f_.log 2025-12-04T14:35:13.2965753Z Running 30 items in this shard: test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu, test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu_only_accepts_numbers, test/nn/test_init.py::TestNNInit::test_calculate_gain_linear, test/nn/test_init.py::TestNNInit::test_calculate_gain_nonlinear, test/nn/test_init.py::TestNNInit::test_calculate_gain_only_accepts_valid_nonlinearities, test/nn/test_init.py::TestNNInit::test_constant, test/nn/test_init.py::TestNNInit::test_deprecation, test/nn/test_init.py::TestNNInit::test_dirac_identity, test/nn/test_init.py::TestNNInit::test_dirac_only_works_on_3_4_5d_inputs, test/nn/test_init.py::TestNNInit::test_dirac_properties, test/nn/test_init.py::TestNNInit::test_eye, test/nn/test_init.py::TestNNInit::test_eye_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_kaiming_normal, test/nn/test_init.py::TestNNInit::test_kaiming_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_normal_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_kaiming_uniform, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_normal, test/nn/test_init.py::TestNNInit::test_ones_and_zeros, test/nn/test_init.py::TestNNInit::test_orthogonal, test/nn/test_init.py::TestNNInit::test_sparse_default_std, test/nn/test_init.py::TestNNInit::test_sparse_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_trunc_normal, test/nn/test_init.py::TestNNInit::test_trunc_normal_generator, test/nn/test_init.py::TestNNInit::test_uniform, test/nn/test_init.py::TestNNInit::test_xavier_normal, test/nn/test_init.py::TestNNInit::test_xavier_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_xavier_uniform, test/nn/test_init.py::TestNNInit::test_xavier_uniform_errors_on_inputs_smaller_than_2d 2025-12-04T14:35:13.2971346Z 2025-12-04T14:35:13.2971565Z Finished nn/test_init 1/1 ... [2025-12-04 14:35:13.295298][20553.223588868], took 0.11min 2025-12-04T14:35:13.3344332Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_init/nn.test_init-4d2f3b79bbd3891e.xml 2025-12-04T14:35:13.3936482Z Running torch_np/numpy_tests/core/test_scalar_methods 1/1 ... [2025-12-04 14:35:13.393381][20553.321678288] 2025-12-04T14:35:13.3937059Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:13.3939597Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_scalar_methods.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:13.393681] 2025-12-04T14:35:16.4132838Z 2025-12-04T14:35:16.4134606Z torch_np/numpy_tests/core/test_scalar_methods 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_scalar_methods_1.1_ab75aa6938d6d2cf_.log 2025-12-04T14:35:16.4156463Z Running 77 items in this shard: test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_against_known_values, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_errors_ftype0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_errors_ftype1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_errors_ftype2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_roundtrip_ftype0_frac_vals0_exp_vals0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_roundtrip_ftype1_frac_vals1_exp_vals1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_roundtrip_ftype2_frac_vals2_exp_vals2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_simple_fractions_ftype0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_simple_fractions_ftype1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_simple_fractions_ftype2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_-0_875_ratio1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_0_0_ratio2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_0_875_ratio0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_11_5_ratio3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_-0_875_ratio1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_0_0_ratio2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_0_875_ratio0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_11_5_ratio3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_-0_875_ratio1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_0_0_ratio2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_0_875_ratio0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_11_5_ratio3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_b, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_h, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_i, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_l, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_inf_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_inf_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_inf_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_nan_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_nan_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_nan_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_B, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_b, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_h, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_i, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_l, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls4, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls5, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_non_numeric_cls0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_?, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_B, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_D, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_F, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_b, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_h, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_i, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_l, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_scalar, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_bit_count, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype4, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype5, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype6, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype7 2025-12-04T14:35:16.4177476Z 2025-12-04T14:35:16.4177755Z Finished torch_np/numpy_tests/core/test_scalar_methods 1/1 ... [2025-12-04 14:35:16.413193][20556.341489955], took 0.05min 2025-12-04T14:35:16.4520203Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_methods/torch_np.numpy_tests.core.test_scalar_methods-ab020ac5345dfbce.xml 2025-12-04T14:35:16.4965989Z Running torch_np/numpy_tests/fft/test_helper 1/1 ... [2025-12-04 14:35:16.496316][20556.424615962] 2025-12-04T14:35:16.4966804Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:16.4969194Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/fft/test_helper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:16.496610] 2025-12-04T14:35:24.1234875Z 2025-12-04T14:35:24.1236505Z torch_np/numpy_tests/fft/test_helper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.fft.test_helper_1.1_692efc716c4081b6_.log 2025-12-04T14:35:24.1239491Z Running 8 items in this shard: test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_axes_keyword, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_definition, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_equal_to_original, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_inverse, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_uneven_dims, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTFreq::test_definition, test/torch_np/numpy_tests/fft/test_helper.py::TestRFFTFreq::test_definition, test/torch_np/numpy_tests/fft/test_helper.py::TestIRFFTN::test_not_last_axis_success 2025-12-04T14:35:24.1241548Z 2025-12-04T14:35:24.1241803Z Finished torch_np/numpy_tests/fft/test_helper 1/1 ... [2025-12-04 14:35:24.123094][20564.051384261], took 0.13min 2025-12-04T14:35:24.1622193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_helper/torch_np.numpy_tests.fft.test_helper-f56f2bb61bb011d4.xml 2025-12-04T14:35:24.2945823Z Running test_mobile_optimizer 1/1 ... [2025-12-04 14:35:24.294304][20564.222602934] 2025-12-04T14:35:24.2946541Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:24.2948736Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mobile_optimizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:24.294583] 2025-12-04T14:35:29.2173716Z 2025-12-04T14:35:29.2174636Z test_mobile_optimizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mobile_optimizer_1.1_9983f8f253c5b059_.log 2025-12-04T14:35:29.2177210Z Running 7 items in this shard: test/test_mobile_optimizer.py::TestOptimizer::test_clone_module_with_class, test/test_mobile_optimizer.py::TestOptimizer::test_generate_mobile_module_lints, test/test_mobile_optimizer.py::TestOptimizer::test_hoist_conv_packed_params, test/test_mobile_optimizer.py::TestOptimizer::test_mobilenet_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_preserve_bundled_inputs_methods, test/test_mobile_optimizer.py::TestOptimizer::test_quantized_conv_no_asan_failures 2025-12-04T14:35:29.2179704Z 2025-12-04T14:35:29.2179974Z Finished test_mobile_optimizer 1/1 ... [2025-12-04 14:35:29.217130][20569.145425029], took 0.08min 2025-12-04T14:35:29.2559111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-6eaefe04adaaf056.xml 2025-12-04T14:35:29.2939934Z Running test_overrides 1/1 ... [2025-12-04 14:35:29.293759][20569.2220571] 2025-12-04T14:35:29.2940393Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:29.2943398Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_overrides.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:29.294079] 2025-12-04T14:35:35.9195668Z 2025-12-04T14:35:35.9196818Z test_overrides 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_overrides_1.1_89ffca0310ffa68a_.log 2025-12-04T14:35:35.9526487Z Running 1477 items in this shard: test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_H___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_T___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__backward_hooks___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__base___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__cdata___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__grad_fn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__post_accumulate_grad_hooks___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__version___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_data___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_device___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_dtype___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad_dtype___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad_fn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_imag___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_cpu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_cuda___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_ipu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_leaf___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_maia___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_meta___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mkldnn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mps___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mtia___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_nested___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_quantized___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_sparse___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_sparse_csr___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_vulkan___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_xla___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_xpu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_itemsize___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_layout___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_mH___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_mT___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_name___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_names___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_nbytes___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_ndim___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_output_nr___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_real___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_requires_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_retains_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_shape___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_volatile___get__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___add__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___and__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___array__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___array_wrap__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___bool__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___complex__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___contains__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___cuda_array_interface_____get__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___deepcopy__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___div__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___dlpack__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___dlpack_device__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___eq__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___float__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___floordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___format__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ge__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___getitem__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___gt__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___iadd__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___iand__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___idiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ifloordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ilshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___imod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___imul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___index__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___int__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___invert__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ior__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___irshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___isub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ixor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___le__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___len__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___long__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___lshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___lt__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___matmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___mod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___mul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ne__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___nonzero__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___or__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___radd__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rand__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rdiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___reduce_ex__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___repr__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___reversed__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rfloordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rlshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmatmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ror__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rpow__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rrshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rsub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rxor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___setitem__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___setstate__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___sub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___truediv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___xor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__autocast_to_full_precision, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__autocast_to_reduced_precision, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__clear_non_serializable_cached_data, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__coalesced_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__dimI, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__dimV, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__is_view, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_storage_offsets, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_strides, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nnz, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__sparse_mask_projection, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__to_dense, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__update_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__values, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_abs, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_abs_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_absolute, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_absolute_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addbmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addbmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcdiv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcdiv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcmul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcmul_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addr_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_adjoint, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_align_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_align_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_all, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_allclose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_amax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_amin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_aminmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_angle, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_any, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_apply_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argmin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argsort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argwhere, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_backward, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_baddbmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_baddbmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bernoulli, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bernoulli_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bfloat16, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bincount, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_and, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_and_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_left_shift, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_left_shift_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_not, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_not_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_or, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_or_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_right_shift, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_right_shift_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_xor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_xor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bool, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_broadcast_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_byte, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cauchy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ccol_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cdouble, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ceil, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ceil_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cfloat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_chalf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_char, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_max, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_max_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_min, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_min_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clip, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clip_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clone, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_coalesce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_col_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj_physical, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj_physical_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_contiguous, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copysign, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copysign_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_corrcoef, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_count_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cov, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cpu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cross, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_crow_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cuda, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cummax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cummin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumprod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumprod_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumsum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumsum_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_data_ptr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_deg2rad, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_deg2rad_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dense_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dequantize, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_det, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_detach, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_detach_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diag, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diag_embed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagflat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagonal_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diff, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_digamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dim_order, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dist, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_div, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_div_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_double, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_element_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_eq, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_eq_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erf_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfinv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expand, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expand_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expm1_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exponential_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fill_diagonal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fix, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fix_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flatten, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flip, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fliplr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flipud, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float_power, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float_power_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmod_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frac, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frac_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gather, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gcd, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gcd_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ge, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ge_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_geometric_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_geqrf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ger, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_get_device, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_half, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hardshrink, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_has_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hash_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_heaviside, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_heaviside_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_histc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_histogram, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hypot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hypot_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_i0, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_i0_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igammac, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igammac_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_copy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_copy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_fill, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_put, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_put_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_reduce_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_inner, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_int, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_int_repr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ipu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_coalesced, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_complex, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_contiguous, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_distributed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_floating_point, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_inference, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_pinned, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_same_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_set_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_shared, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_signed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isclose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isfinite, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isinf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isnan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isneginf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isposinf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isreal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_istft, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_item, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_kron, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_kthvalue, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lcm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lcm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ldexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ldexp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_le, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_le_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lerp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lerp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lgamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lgamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log10, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log10_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log1p_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logaddexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logaddexp2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logcumsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logdet, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_and, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_and_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_not, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_not_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_or, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_or_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_xor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_xor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logit_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_long, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_map2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_map_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_fill, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_scatter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_max, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_maximum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mean, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_median, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_min, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_minimum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mode, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_module_load, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_moveaxis, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_movedim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_msort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mtia, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mul_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multinomial, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multiply, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multiply_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mvlgamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mvlgamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nan_to_num, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nan_to_num_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanmean, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanmedian, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanquantile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nansum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_narrow, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_narrow_copy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ndimension, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ne, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ne_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_neg_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_negative, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_negative_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nelement, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nextafter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nextafter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nonzero_static, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_norm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_not_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_not_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_numel, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_numpy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_orgqr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ormqr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_outer, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_permute, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pin_memory, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pinverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_polygamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_positive, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pow, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pow_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_prelu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_prod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_put, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_put_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_axis, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_scales, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_zero_points, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_scale, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_zero_point, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_qr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_qscheme, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_quantile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rad2deg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rad2deg_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_random_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ravel, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reciprocal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reciprocal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_record_stream, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_refine_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_register_hook, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_register_post_accumulate_grad_hook, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_relu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_relu_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_remainder, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_remainder_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rename, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rename_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_renorm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_renorm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_repeat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_repeat_interleave, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_requires_grad_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reshape, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reshape_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as_sparse_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resolve_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resolve_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_retain_grad, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_roll, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rot90, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_round, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_round_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_row_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rsqrt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rsqrt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_reduce_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_select_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_set_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sgn, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sgn_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_share_memory_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_short, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sigmoid_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sign, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sign_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_signbit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slice_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slice_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_smm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_mask, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_resize_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_resize_and_clear_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sqrt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_square, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_square_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_squeeze, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_squeeze_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sspaddmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_std, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_stft, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage_offset, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage_type, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sub, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sub_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_subtract, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_subtract_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sum_to_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_svd, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapaxes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapaxes_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapdims, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapdims_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_t, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_t_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_take, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_take_along_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_dense, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_mkldnn, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_sparse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tolist, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_topk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trace, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_transpose_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triangular_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tril, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tril_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triu_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_true_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_true_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trunc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trunc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_type, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_type_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unbind, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unfold, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unique, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unique_consecutive, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsqueeze, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsqueeze_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_untyped_storage, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_values, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_var, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_vdot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_view, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_view_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_vsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_where, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xlogy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xpu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_zero_, test/test_overrides.py::TestTorchFunctionOverride::test_base, test/test_overrides.py::TestTorchFunctionOverride::test_dtype_override, test/test_overrides.py::TestTorchFunctionOverride::test_grad, test/test_overrides.py::TestTorchFunctionOverride::test_has_torch_function_non_sequence, test/test_overrides.py::TestTorchFunctionOverride::test_mean_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_mm_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_pow_rpow, test/test_overrides.py::TestTorchFunctionOverride::test_precedence_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_tensor_subclass_propagation, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fftshift, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifftshift, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cholesky_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cond, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cross, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_det, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eig, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigvals, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigvalsh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_householder_product, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_inv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_inv_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_factor, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_factor_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lstsq, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_factor, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_factor_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_rank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_multi_dot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_pinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_qr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve_triangular, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_svd, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_svdvals, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_tensorinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_tensorsolve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vander, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vecdot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vector_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_avg_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_avg_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_gelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_linear, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_log_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_one_hot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_scaled_dot_product_attention, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_softplus, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_softshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_airy_ai, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_j0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_j1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_y0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_y1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_u, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_v, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_w, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_entr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erf, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfcx, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_expit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammainc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammaincc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammaln, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_hermite_polynomial_h, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_hermite_polynomial_he, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i0e, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i1e, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_laguerre_polynomial_l, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_legendre_polynomial_p, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log_ndtr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_logit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_i1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_k0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_k1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_multigammaln, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_ndtr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_ndtri, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_psi, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_round, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_scaled_modified_bessel_k0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_scaled_modified_bessel_k1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_u, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_v, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_w, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_spherical_bessel_j0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_xlog1py, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_zeta, test/test_overrides.py::TestTorchFunctionOverride::test_torch__assert_async, test/test_overrides.py::TestTorchFunctionOverride::test_torch__conj_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__functional_assert_async, test/test_overrides.py::TestTorchFunctionOverride::test_torch__fused_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__fw_primal_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lobpcg_lobpcg, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lowrank_pca_lowrank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lowrank_svd_lowrank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__make_dual_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__native_batch_norm_legit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__neg_view_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__reshape_alias_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__rowwise_prune, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sparse_broadcast_to_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_acos, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_asin, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_atan, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_cos, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sin, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_tan, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__values_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__wrapped_linear_prepack, test/test_overrides.py::TestTorchFunctionOverride::test_torch__wrapped_quantized_linear_prepacked, test/test_overrides.py::TestTorchFunctionOverride::test_torch_abs, test/test_overrides.py::TestTorchFunctionOverride::test_torch_absolute, test/test_overrides.py::TestTorchFunctionOverride::test_torch_acos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_acosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adaptive_avg_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adaptive_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addbmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addcdiv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addcmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addmv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adjoint, test/test_overrides.py::TestTorchFunctionOverride::test_torch_affine_grid_generator, test/test_overrides.py::TestTorchFunctionOverride::test_torch_alias_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_all, test/test_overrides.py::TestTorchFunctionOverride::test_torch_allclose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_amax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_amin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_aminmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_angle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_any, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arccos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arccosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arcsin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arcsinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctan2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argsort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argwhere, test/test_overrides.py::TestTorchFunctionOverride::test_torch_as_strided_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_as_strided_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_asin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_asinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atan2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_avg_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_baddbmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_backward_elemt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_backward_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_elemt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_gather_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_gather_stats_with_counts, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_update_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bernoulli, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bilinear, test/test_overrides.py::TestTorchFunctionOverride::test_torch_binary_cross_entropy_with_logits, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bincount, test/test_overrides.py::TestTorchFunctionOverride::test_torch_binomial, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_and, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_left_shift, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_or, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_right_shift, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_xor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_broadcast_to, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bucketize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ccol_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ceil, test/test_overrides.py::TestTorchFunctionOverride::test_torch_celu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_channel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_choose_qparams_optimized, test/test_overrides.py::TestTorchFunctionOverride::test_torch_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clip, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clone, test/test_overrides.py::TestTorchFunctionOverride::test_torch_col_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_column_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_combinations, test/test_overrides.py::TestTorchFunctionOverride::test_torch_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_concat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_concatenate, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conj_physical, test/test_overrides.py::TestTorchFunctionOverride::test_torch_constant_pad_nd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_tbc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_copysign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_corrcoef, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosine_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosine_similarity, test/test_overrides.py::TestTorchFunctionOverride::test_torch_count_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cov, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cross, test/test_overrides.py::TestTorchFunctionOverride::test_torch_crow_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ctc_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cummax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cummin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumprod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumsum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumulative_trapezoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_deg2rad, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dequantize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_det, test/test_overrides.py::TestTorchFunctionOverride::test_torch_detach, test/test_overrides.py::TestTorchFunctionOverride::test_torch_detach_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diag_embed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagflat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diff, test/test_overrides.py::TestTorchFunctionOverride::test_torch_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dsmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_embedding, test/test_overrides.py::TestTorchFunctionOverride::test_torch_embedding_bag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_empty_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_eq, test/test_overrides.py::TestTorchFunctionOverride::test_torch_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_expand_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fake_quantize_per_channel_affine, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fake_quantize_per_tensor_affine, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_fp16_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_fp16_weight_fp32_activation, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_int8_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_int8_weight_fp32_activation, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_quantize_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_pack_gemm_matrix_fp16, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_pack_quantized_matrix, test/test_overrides.py::TestTorchFunctionOverride::test_torch_feature_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_feature_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fix, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flatten, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flip, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fliplr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flipud, test/test_overrides.py::TestTorchFunctionOverride::test_torch_float_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch_floor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_floor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frac, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frobenius_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_full_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_empty_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_float_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_scalar_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_mixed_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_nested_tuple_getitem, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_not_first_in_list, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_precedence_in_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_block_diag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_broadcast_tensors, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_cartesian_prod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_cdist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_chain_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_einsum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_lu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_meshgrid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_stft, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_tensordot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unique, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unique_consecutive, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unravel_index, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fused_moving_avg_obs_fake_quant, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gather, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gcd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ge, test/test_overrides.py::TestTorchFunctionOverride::test_torch_geqrf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ger, test/test_overrides.py::TestTorchFunctionOverride::test_torch_get_device, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gradient, test/test_overrides.py::TestTorchFunctionOverride::test_torch_greater, test/test_overrides.py::TestTorchFunctionOverride::test_torch_greater_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler_2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler_3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gru, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gru_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hardshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hash_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_heaviside, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hinge_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histogram, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histogramdd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hsmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hypot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch_igamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_igammac, test/test_overrides.py::TestTorchFunctionOverride::test_torch_imag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_fill, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_put, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_inner, test/test_overrides.py::TestTorchFunctionOverride::test_torch_instance_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_int_repr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_distributed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_floating_point, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_inference, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_same_size, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_signed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isclose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isfinite, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isinf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isnan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isneginf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isposinf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isreal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_istft, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kl_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kron, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kthvalue, test/test_overrides.py::TestTorchFunctionOverride::test_torch_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lcm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ldexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_le, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lerp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_less, test/test_overrides.py::TestTorchFunctionOverride::test_torch_less_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lgamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log10, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logaddexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logaddexp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logcumsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_and, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_or, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_xor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lstm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lstm_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lu_unpack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_margin_ranking_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_fill, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_maximum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_median, test/test_overrides.py::TestTorchFunctionOverride::test_torch_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_minimum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_add_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_depthwise_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_rnn, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mode, test/test_overrides.py::TestTorchFunctionOverride::test_torch_moveaxis, test/test_overrides.py::TestTorchFunctionOverride::test_torch_movedim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_msort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_multinomial, test/test_overrides.py::TestTorchFunctionOverride::test_torch_multiply, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mvlgamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nan_to_num, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanmean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanmedian, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanquantile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nansum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_narrow, test/test_overrides.py::TestTorchFunctionOverride::test_torch_narrow_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_channel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ne, test/test_overrides.py::TestTorchFunctionOverride::test_torch_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_negative, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nextafter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional__threshold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_avg_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_avg_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_affine_grid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_binary_cross_entropy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_binary_cross_entropy_with_logits, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_celu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_cosine_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_cross_entropy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_ctc_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_elu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_embedding, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_embedding_bag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_feature_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_gaussian_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_glu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_grid_sample, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_gumbel_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_hardtanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_hinge_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_huber_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_instance_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_interpolate, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_kl_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_l1_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_leaky_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_local_response_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_margin_ranking_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_mish, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_mse_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multi_head_attention_forward, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multi_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multilabel_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multilabel_soft_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_normalize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_pad, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_poisson_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_relu6, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_rrelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_selu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_silu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_smooth_l1_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_soft_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softsign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_tanhshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_triplet_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_triplet_margin_with_distance_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_unfold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_constant_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_kaiming_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nonzero_static, test/test_overrides.py::TestTorchFunctionOverride::test_torch_norm_except_dim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_not_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nuclear_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_numel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ones_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_orgqr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ormqr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_outer, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pairwise_distance, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pdist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_permute, test/test_overrides.py::TestTorchFunctionOverride::test_torch_permute_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pinverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pixel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pixel_unshuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_poisson, test/test_overrides.py::TestTorchFunctionOverride::test_torch_poisson_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_polar, test/test_overrides.py::TestTorchFunctionOverride::test_torch_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_positive, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pow, test/test_overrides.py::TestTorchFunctionOverride::test_torch_prelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_prod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_put, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_axis, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_scales, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_zero_points, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_scale, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_zero_point, test/test_overrides.py::TestTorchFunctionOverride::test_torch_qr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_channel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_tensor_dynamic, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_gru_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_lstm_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_rnn_relu_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_rnn_tanh_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rad2deg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ravel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_real, test/test_overrides.py::TestTorchFunctionOverride::test_torch_reciprocal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_remainder, test/test_overrides.py::TestTorchFunctionOverride::test_torch_renorm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_repeat_interleave, test/test_overrides.py::TestTorchFunctionOverride::test_torch_reshape, test/test_overrides.py::TestTorchFunctionOverride::test_torch_resolve_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_resolve_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_relu_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_tanh_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_roll, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rot90, test/test_overrides.py::TestTorchFunctionOverride::test_torch_round, test/test_overrides.py::TestTorchFunctionOverride::test_torch_row_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_row_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rrelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rsqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rsub, test/test_overrides.py::TestTorchFunctionOverride::test_torch_saddmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_searchsorted, test/test_overrides.py::TestTorchFunctionOverride::test_torch_segment_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_selu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sgn, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_signbit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch_smm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_with_sizes_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_square, test/test_overrides.py::TestTorchFunctionOverride::test_torch_squeeze, test/test_overrides.py::TestTorchFunctionOverride::test_torch_squeeze_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_std, test/test_overrides.py::TestTorchFunctionOverride::test_torch_std_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sub, test/test_overrides.py::TestTorchFunctionOverride::test_torch_subtract, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_svd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_swapaxes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_swapdims, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_float, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_int, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_ite, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_sum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch_t_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_take, test/test_overrides.py::TestTorchFunctionOverride::test_torch_take_along_dim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_threshold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_topk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trace, test/test_overrides.py::TestTorchFunctionOverride::test_torch_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_transpose_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trapezoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trapz, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triangular_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tril, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triplet_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_true_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trunc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unbind, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unbind_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unflatten, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unfold_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsqueeze, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsqueeze_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_values_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_var, test/test_overrides.py::TestTorchFunctionOverride::test_torch_var_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vdot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_complex_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_real, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_real_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_where, test/test_overrides.py::TestTorchFunctionOverride::test_torch_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_zeros_like, test/test_overrides.py::TestTorchFunctionOverride::test_user_implementation_raises, test/test_overrides.py::TestEinsumOverride::test_wrapper, test/test_overrides.py::TestGradCheckOverride::test_gradcheck, test/test_overrides.py::TestNamedTuple::test_max, test/test_overrides.py::TestGradNewOnesOverride::test_newones, test/test_overrides.py::TestPickle::test_pickle, test/test_overrides.py::TestBroadcastAllOverride::test_broadcast_all, test/test_overrides.py::TestWrapTorchFunction::test_wrap_torch_function, test/test_overrides.py::TestIndexing::test_getitem, test/test_overrides.py::TestIndexing::test_getitem_subclass, test/test_overrides.py::TestIndexing::test_setitem, test/test_overrides.py::TestIndexing::test_setitem_subclass, test/test_overrides.py::TestIndexing::test_setitem_val, test/test_overrides.py::TestIterator::test_iterator, test/test_overrides.py::TestRNN::test_rnn, test/test_overrides.py::TestDisabledTorchFunction::test_parameter_does_not_prevent_dispatch, test/test_overrides.py::TestResolveName::test_resolve_name, test/test_overrides.py::TestTorchFunctionWarning::test_torch_function_standalone_class, test/test_overrides.py::TestTorchFunctionWarning::test_torch_function_tensor_subclass, test/test_overrides.py::TestDisabledUserWarnings::test_no_implicit_user_warning_for_deprecated_functions, test/test_overrides.py::TestTorchFunctionMode::test_all_same_mode, test/test_overrides.py::TestTorchFunctionMode::test_basic, test/test_overrides.py::TestTorchFunctionMode::test_custom_device_type, test/test_overrides.py::TestTorchFunctionMode::test_device_context_semantics, test/test_overrides.py::TestTorchFunctionMode::test_disable_enable_subclass, test/test_overrides.py::TestTorchFunctionMode::test_disable_enable_torch_function_ctx, test/test_overrides.py::TestTorchFunctionMode::test_disable_subclass_mode, test/test_overrides.py::TestTorchFunctionMode::test_disable_subclass_not_mode, test/test_overrides.py::TestTorchFunctionMode::test_distributions_bernoulli, test/test_overrides.py::TestTorchFunctionMode::test_error_using_class_method_on_mode, test/test_overrides.py::TestTorchFunctionMode::test_factory_override, test/test_overrides.py::TestTorchFunctionMode::test_get_cur_mode, test/test_overrides.py::TestTorchFunctionMode::test_get_mode_stack, test/test_overrides.py::TestTorchFunctionMode::test_getitem_call, test/test_overrides.py::TestTorchFunctionMode::test_mode_notimplemented_loop, test/test_overrides.py::TestTorchFunctionMode::test_modes_handle_first, test/test_overrides.py::TestTorchFunctionMode::test_modes_return_notimplemented, test/test_overrides.py::TestTorchFunctionMode::test_nested_modes_with_python_has_torch_function, test/test_overrides.py::TestTorchFunctionMode::test_nested_same_mode, test/test_overrides.py::TestTorchFunctionMode::test_nn_parse_to, test/test_overrides.py::TestTorchFunctionMode::test_reentrant_mode_idiom, test/test_overrides.py::TestTorchFunctionMode::test_restacking_with_ancestor, test/test_overrides.py::TestTorchFunctionMode::test_subclass_hash, test/test_overrides.py::TestTorchFunctionMode::test_torch_function_all_disabled_api, test/test_overrides.py::TestTorchFunctionMode::test_with_mode, test/test_overrides.py::TestTorchFunctionMode::test_with_mode_created_separately, test/test_overrides.py::TestTorchFunctionMode::test_with_nested_modes 2025-12-04T14:35:35.9843532Z 2025-12-04T14:35:35.9843754Z Finished test_overrides 1/1 ... [2025-12-04 14:35:35.921063][20575.849356173], took 0.11min 2025-12-04T14:35:35.9844446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_overrides/test_overrides-54542bcdcb986158.xml 2025-12-04T14:35:36.0748818Z Running torch_np/test_function_base 1/1 ... [2025-12-04 14:35:36.074636][20576.002933099] 2025-12-04T14:35:36.0749274Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:36.0752275Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_function_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:36.074949] 2025-12-04T14:35:39.3453708Z 2025-12-04T14:35:39.3454663Z torch_np/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_function_base_1.1_500584d725d1f2e7_.log 2025-12-04T14:35:39.3455875Z Running 2 items in this shard: test/torch_np/test_function_base.py::TestAppend::test_basic, test/torch_np/test_function_base.py::TestMisc::test_broadcast_shapes 2025-12-04T14:35:39.3456463Z 2025-12-04T14:35:39.3456754Z Finished torch_np/test_function_base 1/1 ... [2025-12-04 14:35:39.345005][20579.273296317], took 0.05min 2025-12-04T14:35:39.3841406Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_function_base/torch_np.test_function_base-75280ab4c9ebfe9c.xml 2025-12-04T14:35:39.4349197Z Running test_type_promotion 1/1 ... [2025-12-04 14:35:39.434665][20579.362963488] 2025-12-04T14:35:39.4349657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:39.4352284Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_promotion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:39.434965] 2025-12-04T14:35:49.1653450Z 2025-12-04T14:35:49.1654742Z test_type_promotion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_promotion_1.1_6db001c785be877d_.log 2025-12-04T14:35:49.1773544Z Running 423 items in this shard: test/test_type_promotion.py::TestTypePromotionCUDA::test_add_wrapped_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alpha_mismatch_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alternate_result_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_bfloat16_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_booleans_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_can_cast_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_out_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_comparison_ops_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_assertraises_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_scalar_mult_tensor_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_computation_ignores_out_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_create_bool_tensors_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_float_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_from_issue_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_fail_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_inplace_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_to_float_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_lt_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_many_promotions_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_mixed_type_backward_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_non_promoting_ops_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_self_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_types_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_tensor_vs_scalar_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_add_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_mul_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_sub_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_ternary_out_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_transpose_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unsigned_cuda 2025-12-04T14:35:49.1886486Z 2025-12-04T14:35:49.1886708Z Finished test_type_promotion 1/1 ... [2025-12-04 14:35:49.165786][20589.094079301], took 0.16min 2025-12-04T14:35:49.2060350Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_promotion/test_type_promotion-263da3bcb01bdba5.xml 2025-12-04T14:35:49.3134386Z Running torch_np/test_scalars_0D_arrays 1/1 ... [2025-12-04 14:35:49.313186][20589.241483944] 2025-12-04T14:35:49.3134883Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:49.3137728Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_scalars_0D_arrays.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:49.313510] 2025-12-04T14:35:52.6357587Z 2025-12-04T14:35:52.6358483Z torch_np/test_scalars_0D_arrays 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_scalars_0D_arrays_1.1_ed0b8e715c6c1568_.log 2025-12-04T14:35:52.6368438Z Running 33 items in this shard: test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_array, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_asarray, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_asarray_int, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_int64, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_array, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_asarray, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_asarray_int, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_int64, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_array, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_asarray, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_asarray_int, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_int64, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_scalar_comparisons, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value0, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value1, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value10, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value11, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value4, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value5, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value6, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value7, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value8, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value9, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value_s, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value_string, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_array_0D, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_array_1D, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_array_2D, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_float32, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_int, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_list, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_list-list, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_literal 2025-12-04T14:35:52.6376055Z 2025-12-04T14:35:52.6376319Z Finished torch_np/test_scalars_0D_arrays 1/1 ... [2025-12-04 14:35:52.635505][20592.563796598], took 0.06min 2025-12-04T14:35:52.6749716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_scalars_0D_arrays/torch_np.test_scalars_0D_arrays-0d870852e9b8ecce.xml 2025-12-04T14:35:52.7127394Z Running test_cuda_primary_ctx 1/1 ... [2025-12-04 14:35:52.712514][20592.64081291] 2025-12-04T14:35:52.7127821Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:35:52.7130704Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_primary_ctx.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:52.712815] 2025-12-04T14:36:08.9167130Z 2025-12-04T14:36:08.9167967Z test_cuda_primary_ctx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_primary_ctx_1.1_e42f702cb40c5a59_.log 2025-12-04T14:36:08.9169802Z Running 4 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_copy, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_pin_memory, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_set_device_0, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_str_repr 2025-12-04T14:36:08.9171128Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_copy 2025-12-04T14:36:08.9171635Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_pin_memory 2025-12-04T14:36:08.9172382Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_set_device_0 2025-12-04T14:36:08.9172903Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_str_repr 2025-12-04T14:36:08.9173200Z 2025-12-04T14:36:08.9173412Z Finished test_cuda_primary_ctx 1/1 ... [2025-12-04 14:36:08.916597][20608.844892977], took 0.27min 2025-12-04T14:36:08.9567526Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-ed390b613a87fd4b.xml 2025-12-04T14:36:09.0441194Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-30e4b3748ee506f0.xml 2025-12-04T14:36:09.0800156Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-a99d9031384e91f6.xml 2025-12-04T14:36:09.1227254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-f9602eb3757a7925.xml 2025-12-04T14:36:09.1561107Z Running profiler/test_profiler_tree 1/1 ... [2025-12-04 14:36:09.155852][20609.084151081] 2025-12-04T14:36:09.1561751Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:36:09.1565034Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler_tree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:09.156206] 2025-12-04T14:36:12.6789164Z 2025-12-04T14:36:12.6801256Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_8c9f11eb1f1e482c_.log 2025-12-04T14:36:12.6808201Z Running 10 items in this shard: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_detailed, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_with_stream, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory_and_stack, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_record_function, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_dispatch, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_function 2025-12-04T14:36:12.6814346Z 2025-12-04T14:36:12.6814771Z Finished profiler/test_profiler_tree 1/1 ... [2025-12-04 14:36:12.678603][20612.606895033], took 0.06min 2025-12-04T14:36:12.7194209Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_profiler_tree/profiler.test_profiler_tree-3ad20a65a3b0a20b.xml 2025-12-04T14:36:12.7664243Z Running torch_np/numpy_tests/lib/test_arraysetops 1/1 ... [2025-12-04 14:36:12.766166][20612.694465281] 2025-12-04T14:36:12.7664996Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:36:12.7667863Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_arraysetops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:12.766486] 2025-12-04T14:36:16.2880064Z 2025-12-04T14:36:16.2881059Z torch_np/numpy_tests/lib/test_arraysetops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_arraysetops_1.1_330f28fa6e8cce7d_.log 2025-12-04T14:36:16.2898772Z Running 62 items in this shard: test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary0_prepend0_append_nan_expected_to_end, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary1_prepend1_append1_expected_to_begin, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary2_prepend_nan_append_nan_expected_to_begin, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary0_prepend_65536_append_65540_expected0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary1_prepend1_append1_expected1, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary2_prepend_0_append_0_expected2, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary3_prepend_3_append_-9_expected3, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_both_arrays_are_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_both_arrays_have_structured_dtype, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_char_array, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_errors, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_first_array_is_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_hit_alternate_algorithm, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_second_array_is_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_table_timedelta_fails, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_timedelta_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_timedelta_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_with_arrays_containing_tuples, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d_array_like, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d_indices, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_manyways, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d_char_array, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d_unique, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setxor1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_union1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_2, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_with_axis_axis_-1, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_with_axis_axis_0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_errors, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_list, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_zeros, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_nanequals, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_sort_order_with_axis 2025-12-04T14:36:16.2915183Z 2025-12-04T14:36:16.2915540Z Finished torch_np/numpy_tests/lib/test_arraysetops 1/1 ... [2025-12-04 14:36:16.287798][20616.216088967], took 0.06min 2025-12-04T14:36:16.3278766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraysetops/torch_np.numpy_tests.lib.test_arraysetops-0f1d2c69ab44ce0e.xml 2025-12-04T14:36:16.3580868Z Running test_dlpack 1/1 ... [2025-12-04 14:36:16.357862][20616.286161747] 2025-12-04T14:36:16.3581272Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:36:16.3583839Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dlpack.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:16.358145] 2025-12-04T14:36:20.2812206Z 2025-12-04T14:36:20.2814228Z test_dlpack 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dlpack_1.1_1ec0373c5e8d3363_.log 2025-12-04T14:36:20.2860710Z Running 154 items in this shard: test/test_dlpack.py::TestTorchDlPackCUDA::test_automatically_select_in_creation_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_copy_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float4_e2m1fn_x2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e4m3fn, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e4m3fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e5m2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e5m2fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e8m0fnu, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float4_e2m1fn_x2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e4m3fn, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e4m3fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e5m2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e5m2fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e8m0fnu, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_convert_default_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_cuda_per_thread_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_default_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_is_conj_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_non_strided_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_requires_grad_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_cpu_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_cuda_streams_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_rocm_streams_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_normalize_strides_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_shared_storage_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_on_different_device_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_unsupported_dtype_error_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_max_version_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_needs_copy_error_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_no_copy_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_unsupported_device_error_cuda 2025-12-04T14:36:20.2898097Z 2025-12-04T14:36:20.2898293Z Finished test_dlpack 1/1 ... [2025-12-04 14:36:20.281218][20620.209510858], took 0.07min 2025-12-04T14:36:20.3210617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_dlpack/test_dlpack-41fa7f929a572602.xml 2025-12-04T14:36:20.3545346Z Running profiler/test_torch_tidy 1/1 ... [2025-12-04 14:36:20.354287][20620.282586988] 2025-12-04T14:36:20.3545865Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:36:20.3548203Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_torch_tidy.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:20.354561] 2025-12-04T14:36:26.7803206Z 2025-12-04T14:36:26.7804714Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_59c0964fd7f7a67f_.log 2025-12-04T14:36:26.7815711Z Running 22 items in this shard: test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_id_uniqueness, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids_with_other_ops, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocations, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_extra_fields, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_impl_reuse, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_mkldnn_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_module_and_optimizer_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_nnmodule_params, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_adam, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_sgd, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_pointers_and_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_refcounts, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_scalar_ins, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_sparse_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_lists, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_properties, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_full, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_keep_alive, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_scalar_args, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_set 2025-12-04T14:36:26.7821644Z 2025-12-04T14:36:26.7821879Z Finished profiler/test_torch_tidy 1/1 ... [2025-12-04 14:36:26.779926][20626.708218852], took 0.11min 2025-12-04T14:36:26.8201097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_torch_tidy/profiler.test_torch_tidy-6d6836cfdd083f06.xml 2025-12-04T14:36:26.8992610Z Running lazy/test_reuse_ir 1/1 ... [2025-12-04 14:36:26.899020][20626.82731895] 2025-12-04T14:36:26.8993040Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:36:26.8995792Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_reuse_ir.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:26.899318] 2025-12-04T14:36:30.3700886Z 2025-12-04T14:36:30.3701700Z lazy/test_reuse_ir 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_reuse_ir_1.1_27a380ba8ffdf251_.log 2025-12-04T14:36:30.3703551Z Running 4 items in this shard: test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAdd, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAddSub, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAddSubFallback, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testBatchNorm 2025-12-04T14:36:30.3704536Z 2025-12-04T14:36:30.3704778Z Finished lazy/test_reuse_ir 1/1 ... [2025-12-04 14:36:30.369828][20630.298126773], took 0.06min 2025-12-04T14:36:30.4088597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_reuse_ir/lazy.test_reuse_ir-927d5809a72b7fa4.xml 2025-12-04T14:36:30.4425714Z Running test_functional_autograd_benchmark 1/1 ... [2025-12-04 14:36:30.442344][20630.370643435] 2025-12-04T14:36:30.4426212Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:36:30.4429014Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functional_autograd_benchmark.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:30.442621] 2025-12-04T14:36:52.5447429Z 2025-12-04T14:36:52.5448429Z test_functional_autograd_benchmark 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_autograd_benchmark_1.1_a799b1a1c12db472_.log 2025-12-04T14:36:52.5449935Z Running 2 items in this shard: test/test_functional_autograd_benchmark.py::TestFunctionalAutogradBenchmark::test_fast_tasks, test/test_functional_autograd_benchmark.py::TestFunctionalAutogradBenchmark::test_slow_tasks 2025-12-04T14:36:52.5450759Z 2025-12-04T14:36:52.5451115Z Finished test_functional_autograd_benchmark 1/1 ... [2025-12-04 14:36:52.544519][20652.472814605], took 0.37min 2025-12-04T14:36:52.5852109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_functional_autograd_benchmark/test_functional_autograd_benchmark-0542bd395ed50334.xml 2025-12-04T14:36:52.6641421Z Running test_reductions 1/1 ... [2025-12-04 14:36:52.663852][20652.592150346] 2025-12-04T14:36:52.6641908Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:36:52.6644420Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:52.664182] 2025-12-04T14:39:04.4946292Z 2025-12-04T14:39:04.4947115Z test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_reductions_1.1_99f27656cb9f6489_.log 2025-12-04T14:39:04.6174704Z Running 4759 items in this shard: test/test_reductions.py::TestReductionsCUDA::test_accreal_type_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_all_any_with_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_issue117215_cuda, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_amin_amax_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_axis_with_dim_one_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_large_axis_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_bincount_cuda, test/test_reductions.py::TestReductionsCUDA::test_bucketization_cuda, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_cumprod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_cumsum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_less_than_64_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histogram_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogram_error_handling_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogramdd_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_integral_promotion_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_max_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mean_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_int_with_optdtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_corner_cases_cuda, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_max_nan_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_minmax_illegal_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_boolean_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_device_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_numpy_named_args_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_bool_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_quantile_backward_cuda, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_quantile_error_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduce_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_empty_any_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_split_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_repeated_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_as_dim_argument_cuda, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_dim_compiled_mode_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_cpu_device_mismatch_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_reduction_uint8_overflow_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_out_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_parallel_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_argmax_argmix_kthvalue_dim_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_reduce_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_large_input_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability2_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float64 2025-12-04T14:39:04.7356999Z 2025-12-04T14:39:04.7357244Z Finished test_reductions 1/1 ... [2025-12-04 14:39:04.500020][20784.428313164], took 2.20min 2025-12-04T14:39:04.7357955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_reductions/test_reductions-113ccd6215a90199.xml 2025-12-04T14:39:04.7358619Z Running test_autoload_enable 1/1 ... [2025-12-04 14:39:04.681408][20784.609703684] 2025-12-04T14:39:04.9835879Z Processing /var/lib/jenkins/workspace/test/cpp_extensions 2025-12-04T14:39:07.7525011Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T14:39:07.7547279Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-12-04T14:40:22.1322983Z Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-12-04T14:40:22.1441139Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13199649 sha256=de4334e69daa3b4e72f7ac2e1519bda7cdc82a85fc69a74402cb435a2aedcc2f 2025-12-04T14:40:22.1444525Z Stored in directory: /tmp/pip-ephem-wheel-cache-tn7_57g2/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1 2025-12-04T14:40:22.1461023Z Successfully built torch_test_cpp_extension 2025-12-04T14:40:22.6045785Z Installing collected packages: torch_test_cpp_extension 2025-12-04T14:40:22.7814637Z Successfully installed torch_test_cpp_extension-0.0.0 2025-12-04T14:40:25.0833107Z 2025-12-04T14:40:25.0833528Z Running tests... 2025-12-04T14:40:25.0833883Z ---------------------------------------------------------------------- 2025-12-04T14:40:25.3810920Z . 2025-12-04T14:40:25.3811294Z ---------------------------------------------------------------------- 2025-12-04T14:40:25.3811678Z Ran 1 test in 0.298s 2025-12-04T14:40:25.3811828Z 2025-12-04T14:40:25.3811902Z OK 2025-12-04T14:40:25.3812004Z 2025-12-04T14:40:25.3812103Z Generating XML reports... 2025-12-04T14:40:25.9521453Z Finished test_autoload_enable 1/1 ... [2025-12-04 14:40:25.951777][20865.880063863], took 1.35min 2025-12-04T14:40:25.9922743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204144025.xml 2025-12-04T14:40:29.3754817Z Running test batch 'tests to run' cost 20012.38 seconds 2025-12-04T14:40:29.3766857Z Emitting td_test_failure_stats_v2 2025-12-04T14:40:29.3770426Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2df5eaaad11f11f0bade0242ac110002 2025-12-04T14:40:29.4935226Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2df5eaaad11f11f0bade0242ac110002 2025-12-04T14:40:29.4945576Z Emitting td_test_failure_stats_v2 2025-12-04T14:40:29.4947714Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2e07e282d11f11f0bade0242ac110002 2025-12-04T14:40:29.5265362Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2e07e282d11f11f0bade0242ac110002 2025-12-04T14:40:29.5266605Z inductor/test_cuda_select_algorithm 1/1 failed! 2025-12-04T14:40:29.5267124Z inductor/test_fp8 1/1 failed! 2025-12-04T14:40:30.1188880Z 2025-12-04T14:40:30.1189327Z real 333m37.579s 2025-12-04T14:40:30.1189720Z user 320m38.892s 2025-12-04T14:40:30.1190063Z sys 40m58.521s 2025-12-04T14:40:30.1190749Z + sccache_epilogue 2025-12-04T14:40:30.1191070Z + echo '::group::Sccache Compilation Log' 2025-12-04T14:40:30.1191688Z ##[group]Sccache Compilation Log 2025-12-04T14:40:30.1192020Z + echo '=================== sccache compilation log ===================' 2025-12-04T14:40:30.1192403Z =================== sccache compilation log =================== 2025-12-04T14:40:30.1192972Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T14:40:30.1312919Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T14:40:30.1313989Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T14:40:30.1314470Z + sccache --show-stats 2025-12-04T14:40:30.1338353Z Compile requests 4236 2025-12-04T14:40:30.1338774Z Compile requests executed 556 2025-12-04T14:40:30.1339143Z Cache hits 281 2025-12-04T14:40:30.1339518Z Cache hits (C/C++) 281 2025-12-04T14:40:30.1339751Z Cache misses 273 2025-12-04T14:40:30.1339964Z Cache misses (C/C++) 273 2025-12-04T14:40:30.1340191Z Cache hits rate 50.72 % 2025-12-04T14:40:30.1340415Z Cache hits rate (C/C++) 50.72 % 2025-12-04T14:40:30.1340624Z Cache timeouts 0 2025-12-04T14:40:30.1340841Z Cache read errors 0 2025-12-04T14:40:30.1341048Z Forced recaches 0 2025-12-04T14:40:30.1341251Z Cache write errors 0 2025-12-04T14:40:30.1341459Z Cache errors 0 2025-12-04T14:40:30.1341835Z Compilations 273 2025-12-04T14:40:30.1342053Z Compilation failures 2 2025-12-04T14:40:30.1342272Z Non-cacheable compilations 0 2025-12-04T14:40:30.1342491Z Non-cacheable calls 299 2025-12-04T14:40:30.1342845Z Non-compilation calls 3381 2025-12-04T14:40:30.1343443Z Unsupported compiler calls 0 2025-12-04T14:40:30.1343824Z Average cache write 0.048 s 2025-12-04T14:40:30.1344059Z Average compiler 5.008 s 2025-12-04T14:40:30.1344275Z Average cache read hit 0.027 s 2025-12-04T14:40:30.1344513Z Failed distributed compilations 0 2025-12-04T14:40:30.1344669Z 2025-12-04T14:40:30.1344749Z Non-cacheable reasons: 2025-12-04T14:40:30.1344935Z unknown source language 241 2025-12-04T14:40:30.1345143Z -E 58 2025-12-04T14:40:30.1345297Z 2025-12-04T14:40:30.1345474Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T14:40:30.1345804Z Version (client) 0.10.0 2025-12-04T14:40:30.1346019Z + sccache --stop-server 2025-12-04T14:40:30.1360096Z Stopping sccache server... 2025-12-04T14:40:30.1362971Z Compile requests 4236 2025-12-04T14:40:30.1363228Z Compile requests executed 556 2025-12-04T14:40:30.1363474Z Cache hits 281 2025-12-04T14:40:30.1363687Z Cache hits (C/C++) 281 2025-12-04T14:40:30.1363907Z Cache misses 273 2025-12-04T14:40:30.1364105Z Cache misses (C/C++) 273 2025-12-04T14:40:30.1364335Z Cache hits rate 50.72 % 2025-12-04T14:40:30.1364559Z Cache hits rate (C/C++) 50.72 % 2025-12-04T14:40:30.1364765Z Cache timeouts 0 2025-12-04T14:40:30.1364988Z Cache read errors 0 2025-12-04T14:40:30.1365200Z Forced recaches 0 2025-12-04T14:40:30.1365398Z Cache write errors 0 2025-12-04T14:40:30.1365607Z Cache errors 0 2025-12-04T14:40:30.1365812Z Compilations 273 2025-12-04T14:40:30.1366038Z Compilation failures 2 2025-12-04T14:40:30.1366255Z Non-cacheable compilations 0 2025-12-04T14:40:30.1366722Z Non-cacheable calls 299 2025-12-04T14:40:30.1367136Z Non-compilation calls 3381 2025-12-04T14:40:30.1367528Z Unsupported compiler calls 0 2025-12-04T14:40:30.1367809Z Average cache write 0.048 s 2025-12-04T14:40:30.1368040Z Average compiler 5.008 s 2025-12-04T14:40:30.1368259Z Average cache read hit 0.027 s 2025-12-04T14:40:30.1368486Z Failed distributed compilations 0 2025-12-04T14:40:30.1368634Z 2025-12-04T14:40:30.1368711Z Non-cacheable reasons: 2025-12-04T14:40:30.1368899Z unknown source language 241 2025-12-04T14:40:30.1369115Z -E 58 2025-12-04T14:40:30.1369262Z 2025-12-04T14:40:30.1369435Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T14:40:30.1369762Z Version (client) 0.10.0 2025-12-04T14:40:30.1369990Z + echo ::endgroup:: 2025-12-04T14:40:30.1370357Z ##[endgroup] 2025-12-04T14:40:30.1370514Z + cleanup_workspace 2025-12-04T14:40:30.1370883Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.' 2025-12-04T14:40:30.1371432Z sudo may print the following warning message that can be ignored. The chown command will still run. 2025-12-04T14:40:30.1371873Z + echo ' sudo: setrlimit(RLIMIT_STACK): Operation not permitted' 2025-12-04T14:40:30.1372191Z sudo: setrlimit(RLIMIT_STACK): Operation not permitted 2025-12-04T14:40:30.1372575Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42' 2025-12-04T14:40:30.1372979Z For more details refer to https://github.com/sudo-project/sudo/issues/42 2025-12-04T14:40:30.1373376Z + sudo chown -R 1000 /var/lib/jenkins/workspace 2025-12-04T14:40:31.1379206Z ##[error]Process completed with exit code 1. 2025-12-04T14:40:31.1446367Z Prepare all required actions 2025-12-04T14:40:31.1446718Z Getting action download info 2025-12-04T14:40:31.3202645Z ##[group]Run ./.github/actions/pytest-cache-upload 2025-12-04T14:40:31.3203086Z with: 2025-12-04T14:40:31.3203265Z cache_dir: .pytest_cache 2025-12-04T14:40:31.3203462Z shard: 2 2025-12-04T14:40:31.3203651Z sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T14:40:31.3203902Z test_config: default 2025-12-04T14:40:31.3204146Z job_identifier: trunk_linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T14:40:31.3204415Z env: 2025-12-04T14:40:31.3204576Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:31.3204775Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:31.3205018Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:31.3205431Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:31.3205801Z ##[endgroup] 2025-12-04T14:40:31.3233780Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T14:40:31.3234033Z with: 2025-12-04T14:40:31.3234181Z shell: bash 2025-12-04T14:40:31.3234350Z timeout_minutes: 5 2025-12-04T14:40:31.3234520Z max_attempts: 5 2025-12-04T14:40:31.3234698Z retry_wait_seconds: 30 2025-12-04T14:40:31.3234958Z command: set -eu python3 -m pip install boto3==1.35.42 2025-12-04T14:40:31.3235233Z polling_interval_seconds: 1 2025-12-04T14:40:31.3235432Z warning_on_retry: true 2025-12-04T14:40:31.3235609Z continue_on_error: false 2025-12-04T14:40:31.3235796Z env: 2025-12-04T14:40:31.3235945Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:31.3236125Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:31.3236349Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:31.3236734Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:31.3237075Z ##[endgroup] 2025-12-04T14:40:31.7550283Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T14:40:32.8388335Z Collecting boto3==1.35.42 2025-12-04T14:40:32.8535363Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-12-04T14:40:32.8672235Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0) 2025-12-04T14:40:33.9980749Z Collecting botocore<1.36.0,>=1.35.42 2025-12-04T14:40:34.0011654Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-12-04T14:40:34.1992171Z Collecting s3transfer<0.11.0,>=0.10.0 2025-12-04T14:40:34.2026404Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-12-04T14:40:34.2107229Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1) 2025-12-04T14:40:34.2116790Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10) 2025-12-04T14:40:34.4021511Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0) 2025-12-04T14:40:34.4818333Z Installing collected packages: botocore, s3transfer, boto3 2025-12-04T14:40:35.0402508Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4 2025-12-04T14:40:35.3958727Z Command completed after 1 attempt(s). 2025-12-04T14:40:35.4019347Z ##[group]Run python3 .github/scripts/pytest_cache.py \ 2025-12-04T14:40:35.4019686Z python3 .github/scripts/pytest_cache.py \ 2025-12-04T14:40:35.4019951Z  --upload \ 2025-12-04T14:40:35.4020179Z  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \ 2025-12-04T14:40:35.4020450Z  --pr_identifier "$GITHUB_REF" \ 2025-12-04T14:40:35.4020714Z  --job_identifier "$JOB_IDENTIFIER" \ 2025-12-04T14:40:35.4020951Z  --sha "$SHA" \ 2025-12-04T14:40:35.4021149Z  --test_config "$TEST_CONFIG" \ 2025-12-04T14:40:35.4021393Z  --shard "$SHARD" \ 2025-12-04T14:40:35.4021604Z  --repo "$REPO" \ 2025-12-04T14:40:35.4022010Z  --temp_dir "$RUNNER_TEMP" \ 2025-12-04T14:40:35.4034056Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:35.4034337Z env: 2025-12-04T14:40:35.4034694Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:35.4034893Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:35.4035132Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:35.4035527Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:35.4035896Z CACHE_DIR: .pytest_cache 2025-12-04T14:40:35.4036135Z JOB_IDENTIFIER: trunk_linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T14:40:35.4036427Z SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T14:40:35.4036666Z TEST_CONFIG: default 2025-12-04T14:40:35.4036830Z SHARD: 2 2025-12-04T14:40:35.4036988Z REPO: pytorch/pytorch 2025-12-04T14:40:35.4037168Z ##[endgroup] 2025-12-04T14:40:35.8337745Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013` 2025-12-04T14:40:35.8339556Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='trunk_linux-jammy-cuda12.8-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='default', shard='2', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None) 2025-12-04T14:40:35.8341498Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache 2025-12-04T14:40:35.8342595Z to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2 2025-12-04T14:40:35.8344026Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip 2025-12-04T14:40:35.8345267Z to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip 2025-12-04T14:40:35.8713554Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T14:40:35.8713893Z cat test/**/*_toprint.log || true 2025-12-04T14:40:35.8722279Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:35.8722559Z env: 2025-12-04T14:40:35.8722721Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:35.8722906Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:35.8723136Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:35.8723544Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:35.8723904Z ##[endgroup] 2025-12-04T14:40:35.8827423Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T14:40:35.8853124Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-12-04T14:40:35.8853428Z kill "$MONITOR_SCRIPT_PID" 2025-12-04T14:40:35.8860384Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:35.8860659Z env: 2025-12-04T14:40:35.8860817Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:35.8861017Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:35.8861248Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:35.8861654Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:35.8862020Z MONITOR_SCRIPT_PID: 58792 2025-12-04T14:40:35.8862209Z ##[endgroup] 2025-12-04T14:40:35.8887446Z /home/ec2-user/actions-runner/_work/_temp/76cb5f06-528c-4af5-8335-b307bb0b82bb.sh: line 1: kill: (58792) - No such process 2025-12-04T14:40:35.8890098Z ##[error]Process completed with exit code 1. 2025-12-04T14:40:35.8982701Z Prepare all required actions 2025-12-04T14:40:35.8983073Z Getting action download info 2025-12-04T14:40:36.1137940Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T14:40:36.3212644Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T14:40:36.6503185Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T14:40:36.6503648Z with: 2025-12-04T14:40:36.6503965Z file-suffix: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T14:40:36.6504349Z s3-bucket: gha-artifacts 2025-12-04T14:40:36.6504548Z env: 2025-12-04T14:40:36.6504706Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:36.6504908Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:36.6505155Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:36.6505575Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:36.6505964Z ##[endgroup] 2025-12-04T14:40:36.6526599Z ##[group]Run # Remove any previous test jsons if they exist 2025-12-04T14:40:36.6526937Z # Remove any previous test jsons if they exist 2025-12-04T14:40:36.6527204Z rm -f test-jsons-*.zip 2025-12-04T14:40:36.6527513Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-12-04T14:40:36.6535179Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:36.6535468Z env: 2025-12-04T14:40:36.6535626Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:36.6535817Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:36.6536041Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:36.6536436Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:36.6536933Z FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T14:40:36.6537283Z ##[endgroup] 2025-12-04T14:40:36.6762158Z adding: test/test-reports/td_exclusions-c7fdf23ebe9f44ecabec.json (deflated 82%) 2025-12-04T14:40:36.6773936Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d77224b10dd1e10b.json (deflated 94%) 2025-12-04T14:40:36.6840839Z adding: test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-87366e2d7057b5b0.json (deflated 92%) 2025-12-04T14:40:36.6841749Z adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-f32aa134ae4d7a45.json (deflated 94%) 2025-12-04T14:40:36.6842633Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.json (deflated 87%) 2025-12-04T14:40:36.6843530Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.json (deflated 86%) 2025-12-04T14:40:36.6844408Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.json (deflated 86%) 2025-12-04T14:40:36.6845271Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.json (deflated 87%) 2025-12-04T14:40:36.6846132Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.json (deflated 86%) 2025-12-04T14:40:36.6847001Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.json (deflated 86%) 2025-12-04T14:40:36.6847857Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.json (deflated 87%) 2025-12-04T14:40:36.6848714Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.json (deflated 86%) 2025-12-04T14:40:36.6849802Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.json (deflated 86%) 2025-12-04T14:40:36.6850711Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.json (deflated 87%) 2025-12-04T14:40:36.6851586Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.json (deflated 86%) 2025-12-04T14:40:36.6852585Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.json (deflated 86%) 2025-12-04T14:40:36.6853445Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.json (deflated 87%) 2025-12-04T14:40:36.6854312Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.json (deflated 86%) 2025-12-04T14:40:36.6855171Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.json (deflated 86%) 2025-12-04T14:40:36.6856030Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.json (deflated 87%) 2025-12-04T14:40:36.6856879Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.json (deflated 86%) 2025-12-04T14:40:36.6857808Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.json (deflated 86%) 2025-12-04T14:40:36.6858775Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.json (deflated 87%) 2025-12-04T14:40:36.6859713Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.json (deflated 86%) 2025-12-04T14:40:36.6860729Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.json (deflated 86%) 2025-12-04T14:40:36.6861723Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.json (deflated 87%) 2025-12-04T14:40:36.6862673Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.json (deflated 86%) 2025-12-04T14:40:36.6863682Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.json (deflated 86%) 2025-12-04T14:40:36.6864619Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.json (deflated 87%) 2025-12-04T14:40:36.6865566Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.json (deflated 86%) 2025-12-04T14:40:36.6866558Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.json (deflated 86%) 2025-12-04T14:40:36.6867520Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.json (deflated 87%) 2025-12-04T14:40:36.6868460Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.json (deflated 86%) 2025-12-04T14:40:36.6869434Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.json (deflated 86%) 2025-12-04T14:40:36.6870401Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.json (deflated 87%) 2025-12-04T14:40:36.6871460Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.json (deflated 86%) 2025-12-04T14:40:36.6872492Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.json (deflated 86%) 2025-12-04T14:40:36.6873437Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.json (deflated 87%) 2025-12-04T14:40:36.6874441Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.json (deflated 86%) 2025-12-04T14:40:36.6875475Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.json (deflated 86%) 2025-12-04T14:40:36.6876435Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.json (deflated 87%) 2025-12-04T14:40:36.6877457Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.json (deflated 86%) 2025-12-04T14:40:36.6878407Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.json (deflated 86%) 2025-12-04T14:40:36.6879322Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.json (deflated 87%) 2025-12-04T14:40:36.6880454Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.json (deflated 86%) 2025-12-04T14:40:36.6881407Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.json (deflated 86%) 2025-12-04T14:40:36.6882363Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.json (deflated 87%) 2025-12-04T14:40:36.6883381Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.json (deflated 86%) 2025-12-04T14:40:36.6884313Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.json (deflated 86%) 2025-12-04T14:40:36.6885286Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.json (deflated 87%) 2025-12-04T14:40:36.6886301Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.json (deflated 86%) 2025-12-04T14:40:36.6887253Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.json (deflated 86%) 2025-12-04T14:40:36.6888262Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.json (deflated 87%) 2025-12-04T14:40:36.6889202Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.json (deflated 86%) 2025-12-04T14:40:36.6890240Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.json (deflated 86%) 2025-12-04T14:40:36.6891228Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.json (deflated 87%) 2025-12-04T14:40:36.6892212Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.json (deflated 86%) 2025-12-04T14:40:36.6893299Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.json (deflated 86%) 2025-12-04T14:40:36.6894281Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.json (deflated 87%) 2025-12-04T14:40:36.6895239Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.json (deflated 86%) 2025-12-04T14:40:36.6896164Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.json (deflated 86%) 2025-12-04T14:40:36.6897279Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.json (deflated 87%) 2025-12-04T14:40:36.6898233Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.json (deflated 86%) 2025-12-04T14:40:36.6899133Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.json (deflated 86%) 2025-12-04T14:40:36.6900243Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.json (deflated 87%) 2025-12-04T14:40:36.6901186Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.json (deflated 86%) 2025-12-04T14:40:36.6902177Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.json (deflated 86%) 2025-12-04T14:40:36.6903122Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.json (deflated 87%) 2025-12-04T14:40:36.6904068Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.json (deflated 86%) 2025-12-04T14:40:36.6905069Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.json (deflated 86%) 2025-12-04T14:40:36.6906058Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.json (deflated 87%) 2025-12-04T14:40:36.6907004Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.json (deflated 86%) 2025-12-04T14:40:36.6907957Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.json (deflated 86%) 2025-12-04T14:40:36.6908934Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.json (deflated 87%) 2025-12-04T14:40:36.6909884Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.json (deflated 86%) 2025-12-04T14:40:36.6910891Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.json (deflated 86%) 2025-12-04T14:40:36.6911898Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.json (deflated 87%) 2025-12-04T14:40:36.6912786Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.json (deflated 86%) 2025-12-04T14:40:36.6913814Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.json (deflated 86%) 2025-12-04T14:40:36.6914762Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.json (deflated 87%) 2025-12-04T14:40:36.6915864Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.json (deflated 86%) 2025-12-04T14:40:36.6916835Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.json (deflated 86%) 2025-12-04T14:40:36.6918060Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.json (deflated 87%) 2025-12-04T14:40:36.6919262Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.json (deflated 86%) 2025-12-04T14:40:36.6920303Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.json (deflated 86%) 2025-12-04T14:40:36.6921277Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.json (deflated 87%) 2025-12-04T14:40:36.6922363Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.json (deflated 86%) 2025-12-04T14:40:36.6923291Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.json (deflated 86%) 2025-12-04T14:40:36.6924260Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.json (deflated 87%) 2025-12-04T14:40:36.6925269Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.json (deflated 86%) 2025-12-04T14:40:36.6926231Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.json (deflated 86%) 2025-12-04T14:40:36.6927143Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.json (deflated 87%) 2025-12-04T14:40:36.6928166Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.json (deflated 86%) 2025-12-04T14:40:36.6929165Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.json (deflated 86%) 2025-12-04T14:40:36.6930166Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.json (deflated 87%) 2025-12-04T14:40:36.6931138Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.json (deflated 86%) 2025-12-04T14:40:36.6932076Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.json (deflated 86%) 2025-12-04T14:40:36.6933065Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.json (deflated 87%) 2025-12-04T14:40:36.6934109Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.json (deflated 86%) 2025-12-04T14:40:36.6935069Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.json (deflated 86%) 2025-12-04T14:40:36.6936073Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.json (deflated 87%) 2025-12-04T14:40:36.6937036Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.json (deflated 86%) 2025-12-04T14:40:36.6937993Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.json (deflated 86%) 2025-12-04T14:40:36.6939160Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.json (deflated 87%) 2025-12-04T14:40:36.6940135Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.json (deflated 86%) 2025-12-04T14:40:36.6941072Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.json (deflated 86%) 2025-12-04T14:40:36.6942171Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.json (deflated 87%) 2025-12-04T14:40:36.6943128Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.json (deflated 86%) 2025-12-04T14:40:36.6944141Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.json (deflated 86%) 2025-12-04T14:40:36.6945169Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.json (deflated 87%) 2025-12-04T14:40:36.6946119Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.json (deflated 86%) 2025-12-04T14:40:36.6947065Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.json (deflated 86%) 2025-12-04T14:40:36.6948059Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.json (deflated 87%) 2025-12-04T14:40:36.6949007Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.json (deflated 86%) 2025-12-04T14:40:36.6950019Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.json (deflated 86%) 2025-12-04T14:40:36.6950929Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.json (deflated 87%) 2025-12-04T14:40:36.6951866Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.json (deflated 86%) 2025-12-04T14:40:36.6952879Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.json (deflated 86%) 2025-12-04T14:40:36.6953829Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.json (deflated 87%) 2025-12-04T14:40:36.6954851Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.json (deflated 86%) 2025-12-04T14:40:36.6955824Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.json (deflated 86%) 2025-12-04T14:40:36.6956778Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.json (deflated 87%) 2025-12-04T14:40:36.6957728Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.json (deflated 86%) 2025-12-04T14:40:36.6958746Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.json (deflated 86%) 2025-12-04T14:40:36.6959688Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.json (deflated 87%) 2025-12-04T14:40:36.6960692Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.json (deflated 86%) 2025-12-04T14:40:36.6961789Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.json (deflated 86%) 2025-12-04T14:40:36.6962764Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.json (deflated 87%) 2025-12-04T14:40:36.6963765Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.json (deflated 86%) 2025-12-04T14:40:36.6964771Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.json (deflated 86%) 2025-12-04T14:40:36.6965806Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.json (deflated 87%) 2025-12-04T14:40:36.6966825Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.json (deflated 86%) 2025-12-04T14:40:36.6967791Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.json (deflated 86%) 2025-12-04T14:40:36.6968746Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.json (deflated 87%) 2025-12-04T14:40:36.6969739Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.json (deflated 86%) 2025-12-04T14:40:36.6970709Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.json (deflated 86%) 2025-12-04T14:40:36.6971632Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.json (deflated 87%) 2025-12-04T14:40:36.6972637Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.json (deflated 86%) 2025-12-04T14:40:36.6973598Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.json (deflated 86%) 2025-12-04T14:40:36.6974503Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.json (deflated 87%) 2025-12-04T14:40:36.6975535Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.json (deflated 86%) 2025-12-04T14:40:36.6976552Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.json (deflated 86%) 2025-12-04T14:40:36.6977526Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.json (deflated 87%) 2025-12-04T14:40:36.6978503Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.json (deflated 86%) 2025-12-04T14:40:36.6979424Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.json (deflated 86%) 2025-12-04T14:40:36.6980406Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.json (deflated 87%) 2025-12-04T14:40:36.6981393Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.json (deflated 86%) 2025-12-04T14:40:36.6982321Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.json (deflated 86%) 2025-12-04T14:40:36.6983457Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.json (deflated 87%) 2025-12-04T14:40:36.6984377Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.json (deflated 86%) 2025-12-04T14:40:36.6985319Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.json (deflated 86%) 2025-12-04T14:40:36.6986350Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.json (deflated 87%) 2025-12-04T14:40:36.6987454Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.json (deflated 86%) 2025-12-04T14:40:36.6988453Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.json (deflated 86%) 2025-12-04T14:40:36.6989393Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.json (deflated 87%) 2025-12-04T14:40:36.6990339Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.json (deflated 86%) 2025-12-04T14:40:36.6991342Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.json (deflated 86%) 2025-12-04T14:40:36.6992309Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.json (deflated 87%) 2025-12-04T14:40:36.6993227Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.json (deflated 86%) 2025-12-04T14:40:36.6994229Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.json (deflated 86%) 2025-12-04T14:40:36.6995194Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.json (deflated 87%) 2025-12-04T14:40:36.6996163Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.json (deflated 86%) 2025-12-04T14:40:36.6997154Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.json (deflated 86%) 2025-12-04T14:40:36.6998139Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.json (deflated 87%) 2025-12-04T14:40:36.6999104Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.json (deflated 86%) 2025-12-04T14:40:36.7000234Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.json (deflated 86%) 2025-12-04T14:40:36.7001211Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.json (deflated 87%) 2025-12-04T14:40:36.7002151Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.json (deflated 86%) 2025-12-04T14:40:36.7003123Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.json (deflated 86%) 2025-12-04T14:40:36.7004100Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.json (deflated 87%) 2025-12-04T14:40:36.7005038Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.json (deflated 86%) 2025-12-04T14:40:36.7006155Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.json (deflated 86%) 2025-12-04T14:40:36.7007078Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.json (deflated 87%) 2025-12-04T14:40:36.7008017Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.json (deflated 86%) 2025-12-04T14:40:36.7009181Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.json (deflated 86%) 2025-12-04T14:40:36.7010136Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.json (deflated 87%) 2025-12-04T14:40:36.7011128Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.json (deflated 86%) 2025-12-04T14:40:36.7012086Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.json (deflated 86%) 2025-12-04T14:40:36.7013032Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.json (stored 0%) 2025-12-04T14:40:36.7040904Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-2cb94ab29d3c0df8.json (deflated 97%) 2025-12-04T14:40:36.7048698Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-ab5aa4d4069f84fb.json (deflated 95%) 2025-12-04T14:40:36.7056431Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-35f77a6004c12574.json (deflated 95%) 2025-12-04T14:40:36.7063732Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-4e86df63b0f31fa2.json (deflated 95%) 2025-12-04T14:40:36.7071650Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-421be54fd2475226.json (deflated 95%) 2025-12-04T14:40:36.7079326Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-269f711d29febed7.json (deflated 95%) 2025-12-04T14:40:36.7151742Z adding: test/test-reports/python-pytest/test_ops/test_ops-f35e359ea3f52347.json (deflated 96%) 2025-12-04T14:40:36.7236766Z adding: test/test-reports/python-pytest/test_ops/test_ops-7327fc5de50caef8.json (deflated 96%) 2025-12-04T14:40:36.7270124Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-e6d2768dce09d0dd.json (deflated 94%) 2025-12-04T14:40:36.7275467Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-52326583abfcb307.json (deflated 96%) 2025-12-04T14:40:36.7280870Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-b0e5cfa73b17bf79.json (deflated 96%) 2025-12-04T14:40:36.7286377Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-44c334397fb0c3bd.json (deflated 96%) 2025-12-04T14:40:36.7292781Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3098e6f6c63481df.json (deflated 92%) 2025-12-04T14:40:36.7327074Z adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-d8fc516c8be54fc6.json (deflated 93%) 2025-12-04T14:40:36.7328114Z adding: test/test-reports/python-pytest/inductor.test_layout_optim/inductor.test_layout_optim-ff0e0fc528f4f3dd.json (stored 0%) 2025-12-04T14:40:36.7342831Z adding: test/test-reports/python-pytest/dynamo.test_exc/dynamo.test_exc-59dcc92175511a1b.json (deflated 95%) 2025-12-04T14:40:36.7350521Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_arrayref/inductor.test_aot_inductor_arrayref-c35059cecd7c3b99.json (deflated 95%) 2025-12-04T14:40:36.7351836Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccee7b90c33901e0.json (deflated 88%) 2025-12-04T14:40:36.7352859Z adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-4527efee43b2418d.json (deflated 76%) 2025-12-04T14:40:36.7353815Z adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-1bfa13dfa66ba37a.json (deflated 72%) 2025-12-04T14:40:36.7354649Z adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-d291715b5fb08603.json (deflated 92%) 2025-12-04T14:40:36.7355547Z adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-8f5f76cf24e3a322.json (deflated 34%) 2025-12-04T14:40:36.7356275Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.json (deflated 93%) 2025-12-04T14:40:36.7356935Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.json (deflated 85%) 2025-12-04T14:40:36.7357579Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.json (deflated 85%) 2025-12-04T14:40:36.7361109Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.json (deflated 94%) 2025-12-04T14:40:36.7361781Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.json (deflated 74%) 2025-12-04T14:40:36.7368831Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.json (deflated 97%) 2025-12-04T14:40:36.7370666Z adding: test/test-reports/python-pytest/inductor.test_flex_flash/inductor.test_flex_flash-594b02277dbafddb.json (deflated 97%) 2025-12-04T14:40:36.7371439Z adding: test/test-reports/python-pytest/inductor.test_segmented_tree/inductor.test_segmented_tree-3edcd06141f9e439.json (deflated 90%) 2025-12-04T14:40:36.7372275Z adding: test/test-reports/python-pytest/inductor.test_kernel_optimization/inductor.test_kernel_optimization-b711ca038cf9dded.json (deflated 38%) 2025-12-04T14:40:36.7373067Z adding: test/test-reports/python-pytest/inductor.test_metrics/inductor.test_metrics-b803ef0c4a9491e7.json (deflated 78%) 2025-12-04T14:40:36.7373867Z adding: test/test-reports/python-pytest/export.test_unflatten_training_ir/export.test_unflatten_training_ir-0b06d16f89271e02.json (deflated 94%) 2025-12-04T14:40:36.7390749Z adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-31757cb9ac5c1c41.json (deflated 95%) 2025-12-04T14:40:36.7391764Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-0fae1aaa4a2003eb.json (deflated 92%) 2025-12-04T14:40:36.7392787Z adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-bc092756daa3cb92.json (deflated 84%) 2025-12-04T14:40:36.7474137Z adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-707c3510a208c1b4.json (deflated 95%) 2025-12-04T14:40:36.7475056Z adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-c5f7586c63bf6a73.json (deflated 47%) 2025-12-04T14:40:36.7476022Z adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-a0634e57d5b5356a.json (deflated 91%) 2025-12-04T14:40:36.7476983Z adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-31b2988c053252a6.json (deflated 69%) 2025-12-04T14:40:36.7477955Z adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-84688fa1768a881f.json (deflated 54%) 2025-12-04T14:40:36.7478903Z adding: test/test-reports/python-pytest/inductor.test_best_config/inductor.test_best_config-423c961bf0014ead.json (deflated 52%) 2025-12-04T14:40:36.7479785Z adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-21258e52645f11ac.json (deflated 56%) 2025-12-04T14:40:36.7491664Z adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-bb39f909a1ebabd7.json (deflated 96%) 2025-12-04T14:40:36.7493908Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_custom_ops/inductor.test_aot_inductor_custom_ops-a7a529277f9f9a31.json (deflated 94%) 2025-12-04T14:40:36.7506497Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-955a1f614439a238.json (deflated 97%) 2025-12-04T14:40:36.7507798Z adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-e1e65156309c3950.json (deflated 87%) 2025-12-04T14:40:36.7508886Z adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e4d33245d4b5500b.json (deflated 91%) 2025-12-04T14:40:36.7511220Z adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-212782fb09fba1b6.json (deflated 90%) 2025-12-04T14:40:36.7512296Z adding: test/test-reports/python-pytest/inductor.test_needs_exact_strides/inductor.test_needs_exact_strides-9df9f4b52deeb26d.json (deflated 70%) 2025-12-04T14:40:36.7524842Z adding: test/test-reports/python-pytest/inductor.test_auto_functionalize/inductor.test_auto_functionalize-2721b330ad87dcbb.json (deflated 95%) 2025-12-04T14:40:36.7525980Z adding: test/test-reports/python-pytest/dynamo.test_modes/dynamo.test_modes-84ffae5b03f7e325.json (deflated 85%) 2025-12-04T14:40:36.7527016Z adding: test/test-reports/python-pytest/inductor.test_custom_partitioner_fn/inductor.test_custom_partitioner_fn-7cdf0df133f39710.json (deflated 50%) 2025-12-04T14:40:36.7528045Z adding: test/test-reports/python-pytest/dynamo.test_debug_utils/dynamo.test_debug_utils-34b85abff78e1075.json (deflated 76%) 2025-12-04T14:40:36.7528961Z adding: test/test-reports/python-pytest/dynamo.test_base_hop/dynamo.test_base_hop-f16fc4276458e68a.json (deflated 79%) 2025-12-04T14:40:36.7536382Z adding: test/test-reports/python-pytest/dynamo.test_export/dynamo.test_export-b39b0ef66a188fdf.json (deflated 90%) 2025-12-04T14:40:36.7537327Z adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-89e5f4289c609732.json (deflated 84%) 2025-12-04T14:40:36.7538244Z adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-4cc8060b16634bb1.json (deflated 94%) 2025-12-04T14:40:36.7539487Z adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-9dcd937885307cef.json (deflated 94%) 2025-12-04T14:40:36.7540443Z adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-22b40053bc190597.json (deflated 72%) 2025-12-04T14:40:36.7541718Z adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-e50f738759450405.json (deflated 87%) 2025-12-04T14:40:36.7543133Z adding: test/test-reports/python-pytest/dynamo.test_cudagraphs_expandable_segments/dynamo.test_cudagraphs_expandable_segments-6088bb8977cfc034.json (deflated 87%) 2025-12-04T14:40:36.7546313Z adding: test/test-reports/python-pytest/inductor.test_caching/inductor.test_caching-81c36c30e8c9f16c.json (deflated 97%) 2025-12-04T14:40:36.7547504Z adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-8774cf5ade30b7b9.json (deflated 87%) 2025-12-04T14:40:36.7552099Z adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-debad3a483737c49.json (deflated 93%) 2025-12-04T14:40:36.7553095Z adding: test/test-reports/python-pytest/dynamo.test_comptime/dynamo.test_comptime-47f6d1d4947f2a1a.json (deflated 83%) 2025-12-04T14:40:36.7554029Z adding: test/test-reports/python-pytest/test_privateuseone_python_backend/test_privateuseone_python_backend-af92c89cf1e734fe.json (deflated 65%) 2025-12-04T14:40:36.7555013Z adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-2881d42f49f0d5f4.json (deflated 88%) 2025-12-04T14:40:36.7556118Z adding: test/test-reports/python-pytest/functorch.test_parsing/functorch.test_parsing-b19da695e97869b8.json (deflated 88%) 2025-12-04T14:40:36.7557001Z adding: test/test-reports/python-pytest/test_varlen_attention/test_varlen_attention-c77e9d85fce2d7de.json (deflated 90%) 2025-12-04T14:40:36.7557827Z adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-aebaabaedf0418a4.json (deflated 64%) 2025-12-04T14:40:36.7564089Z adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-b67630ba146aa7a3.json (deflated 97%) 2025-12-04T14:40:36.7565074Z adding: test/test-reports/python-pytest/test_autoload/test_autoload-f8ddaf02f0fba12a.json (deflated 36%) 2025-12-04T14:40:36.7565949Z adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-090bcad7e8d69cda.json (deflated 63%) 2025-12-04T14:40:36.7566812Z adding: test/test-reports/python-pytest/xpu.test_fusion/xpu.test_fusion-5eb21ed31dcf28c7.json (stored 0%) 2025-12-04T14:40:36.7615832Z adding: test/test-reports/python-pytest/test_foreach/test_foreach-08a78fa61936b219.json (deflated 98%) 2025-12-04T14:40:36.7617480Z adding: test/test-reports/python-pytest/test_pytree/test_pytree-863a0f55639901d8.json (deflated 96%) 2025-12-04T14:40:36.7618527Z adding: test/test-reports/python-pytest/test_namedtuple_return_api/test_namedtuple_return_api-8cf83ed9877ffdac.json (deflated 76%) 2025-12-04T14:40:36.7619551Z adding: test/test-reports/python-pytest/profiler.test_record_function/profiler.test_record_function-709377f3af2db71e.json (deflated 85%) 2025-12-04T14:40:36.7620543Z adding: test/test-reports/python-pytest/test_compile_benchmark_util/test_compile_benchmark_util-827ec108afd6e51f.json (deflated 85%) 2025-12-04T14:40:36.7621568Z adding: test/test-reports/python-pytest/test_set_default_mobile_cpu_allocator/test_set_default_mobile_cpu_allocator-a62b10a07a12c95d.json (deflated 66%) 2025-12-04T14:40:36.7626190Z adding: test/test-reports/python-pytest/test_fake_tensor/test_fake_tensor-aa30317e19bcd391.json (deflated 94%) 2025-12-04T14:40:36.7802531Z adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-6264828c6395375f.json (deflated 98%) 2025-12-04T14:40:36.7982669Z adding: test/test-reports/python-pytest/test_meta/test_meta-be563441d2ad6907.json (deflated 97%) 2025-12-04T14:40:36.8009736Z adding: test/test-reports/python-pytest/test_fx/test_fx-26e481d16f3bec04.json (deflated 97%) 2025-12-04T14:40:36.8041264Z adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-e4282282966a7ff9.json (deflated 97%) 2025-12-04T14:40:36.8067302Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-59ab3583fe1f80dc.json (deflated 98%) 2025-12-04T14:40:36.8079563Z adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-5ef9f5430d478fb5.json (deflated 96%) 2025-12-04T14:40:36.8083468Z adding: test/test-reports/python-pytest/complex_tensor.test_complex_tensor/complex_tensor.test_complex_tensor-b8215f419723e2db.json (deflated 96%) 2025-12-04T14:40:36.8085051Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bedc5291ea06d7d4.json (deflated 97%) 2025-12-04T14:40:36.8112601Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-7c0d16c38d6c5d66.json (deflated 95%) 2025-12-04T14:40:36.8140046Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-53045e36c53c5e0b.json (deflated 95%) 2025-12-04T14:40:36.8141053Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-3f363f609079497e.json (deflated 91%) 2025-12-04T14:40:36.8146865Z adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-84ccb0941d397f92.json (deflated 98%) 2025-12-04T14:40:36.8151435Z adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-7140a1ac93a67fd6.json (deflated 96%) 2025-12-04T14:40:36.8208996Z adding: test/test-reports/python-pytest/test_nn/test_nn-9b13ea9411b68db1.json (deflated 97%) 2025-12-04T14:40:36.8210193Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-6bd6911b2987aa11.json (deflated 95%) 2025-12-04T14:40:36.8211515Z adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-a884db72e4413a95.json (deflated 91%) 2025-12-04T14:40:36.8214571Z adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-21d4709389e5b8ee.json (deflated 96%) 2025-12-04T14:40:36.8217883Z adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-ee3bcbb253c53d76.json (deflated 97%) 2025-12-04T14:40:36.8218603Z adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-f847b013a309e4b4.json (deflated 88%) 2025-12-04T14:40:36.8219293Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d415fb392a79d6b5.json (deflated 35%) 2025-12-04T14:40:36.8219930Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-020323916fe208b4.json (deflated 33%) 2025-12-04T14:40:36.8220561Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-ec2ab5853a8f5bb5.json (deflated 33%) 2025-12-04T14:40:36.8221188Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2344770aa1066fe.json (deflated 34%) 2025-12-04T14:40:36.8221804Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7974fe8ff8b4bbf1.json (deflated 33%) 2025-12-04T14:40:36.8222431Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a89ca46644423e33.json (deflated 33%) 2025-12-04T14:40:36.8223039Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a24d5339b3f8a425.json (deflated 33%) 2025-12-04T14:40:36.8223660Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-e1f683ecde7e9859.json (deflated 33%) 2025-12-04T14:40:36.8224280Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-18c9d7ef1c10ca2f.json (deflated 34%) 2025-12-04T14:40:36.8224897Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c34005545b418b69.json (deflated 33%) 2025-12-04T14:40:36.8225496Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-fef6b14731bb5c3b.json (deflated 33%) 2025-12-04T14:40:36.8226104Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-9f269287574f2f5d.json (deflated 33%) 2025-12-04T14:40:36.8226724Z adding: test/test-reports/python-pytest/test_native_mha/test_native_mha-92c62998ce15c273.json (deflated 95%) 2025-12-04T14:40:36.8227506Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numerictypes/torch_np.numpy_tests.core.test_numerictypes-794afd975a25dd08.json (deflated 95%) 2025-12-04T14:40:36.8228350Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-264e490a42609567.json (deflated 37%) 2025-12-04T14:40:36.8229089Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3424f861fbb3f30b.json (deflated 38%) 2025-12-04T14:40:36.8229824Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-34f31a11d054a5e2.json (deflated 38%) 2025-12-04T14:40:36.8230557Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-5cc2b0915fc09b1e.json (deflated 38%) 2025-12-04T14:40:36.8231286Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8b636380fe175cf1.json (deflated 38%) 2025-12-04T14:40:36.8232021Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3e5b65c914a199ce.json (deflated 38%) 2025-12-04T14:40:36.8232736Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-df0d139575237859.json (deflated 35%) 2025-12-04T14:40:36.8233625Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3264b6f40acb9dc5.json (deflated 37%) 2025-12-04T14:40:36.8234360Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8f705997ba04d72c.json (deflated 36%) 2025-12-04T14:40:36.8235066Z adding: test/test-reports/python-pytest/test_function_schema/test_function_schema-23d7f60e4862c430.json (deflated 91%) 2025-12-04T14:40:36.8235736Z adding: test/test-reports/python-pytest/test_accelerator/test_accelerator-96bd402fc51988fa.json (deflated 88%) 2025-12-04T14:40:36.8236458Z adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-4d2f3b79bbd3891e.json (deflated 91%) 2025-12-04T14:40:36.8237241Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_methods/torch_np.numpy_tests.core.test_scalar_methods-ab020ac5345dfbce.json (deflated 97%) 2025-12-04T14:40:36.8238141Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_helper/torch_np.numpy_tests.fft.test_helper-f56f2bb61bb011d4.json (deflated 86%) 2025-12-04T14:40:36.8238945Z adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-6eaefe04adaaf056.json (deflated 83%) 2025-12-04T14:40:36.8253247Z adding: test/test-reports/python-pytest/test_overrides/test_overrides-54542bcdcb986158.json (deflated 97%) 2025-12-04T14:40:36.8254157Z adding: test/test-reports/python-pytest/torch_np.test_function_base/torch_np.test_function_base-75280ab4c9ebfe9c.json (deflated 62%) 2025-12-04T14:40:36.8260768Z adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-263da3bcb01bdba5.json (deflated 98%) 2025-12-04T14:40:36.8261697Z adding: test/test-reports/python-pytest/torch_np.test_scalars_0D_arrays/torch_np.test_scalars_0D_arrays-0d870852e9b8ecce.json (deflated 96%) 2025-12-04T14:40:36.8262616Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-ed390b613a87fd4b.json (deflated 43%) 2025-12-04T14:40:36.8263448Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-30e4b3748ee506f0.json (deflated 42%) 2025-12-04T14:40:36.8264134Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-a99d9031384e91f6.json (deflated 35%) 2025-12-04T14:40:36.8264807Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-f9602eb3757a7925.json (deflated 42%) 2025-12-04T14:40:36.8265550Z adding: test/test-reports/python-pytest/profiler.test_profiler_tree/profiler.test_profiler_tree-3ad20a65a3b0a20b.json (deflated 87%) 2025-12-04T14:40:36.8266415Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraysetops/torch_np.numpy_tests.lib.test_arraysetops-0f1d2c69ab44ce0e.json (deflated 96%) 2025-12-04T14:40:36.8267898Z adding: test/test-reports/python-pytest/test_dlpack/test_dlpack-41fa7f929a572602.json (deflated 97%) 2025-12-04T14:40:36.8268877Z adding: test/test-reports/python-pytest/profiler.test_torch_tidy/profiler.test_torch_tidy-6d6836cfdd083f06.json (deflated 85%) 2025-12-04T14:40:36.8269569Z adding: test/test-reports/python-pytest/lazy.test_reuse_ir/lazy.test_reuse_ir-927d5809a72b7fa4.json (deflated 78%) 2025-12-04T14:40:36.8270323Z adding: test/test-reports/python-pytest/test_functional_autograd_benchmark/test_functional_autograd_benchmark-0542bd395ed50334.json (deflated 63%) 2025-12-04T14:40:36.8333368Z adding: test/test-reports/python-pytest/test_reductions/test_reductions-113ccd6215a90199.json (deflated 98%) 2025-12-04T14:40:36.8334277Z adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204144025.json (deflated 37%) 2025-12-04T14:40:36.8361026Z ##[group]Run # Remove any previous test reports if they exist 2025-12-04T14:40:36.8361375Z # Remove any previous test reports if they exist 2025-12-04T14:40:36.8361654Z rm -f test-reports-*.zip 2025-12-04T14:40:36.8362056Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-12-04T14:40:36.8370054Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:36.8370336Z env: 2025-12-04T14:40:36.8370493Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:36.8370679Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:36.8370901Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:36.8371281Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:36.8371789Z FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T14:40:36.8372265Z ##[endgroup] 2025-12-04T14:40:36.8520855Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d77224b10dd1e10b.xml (deflated 93%) 2025-12-04T14:40:36.8570654Z adding: test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-87366e2d7057b5b0.xml (deflated 92%) 2025-12-04T14:40:36.8582749Z adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-f32aa134ae4d7a45.xml (deflated 93%) 2025-12-04T14:40:36.8583784Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml (deflated 87%) 2025-12-04T14:40:36.8584855Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml (deflated 85%) 2025-12-04T14:40:36.8585904Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml (deflated 85%) 2025-12-04T14:40:36.8586983Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml (deflated 87%) 2025-12-04T14:40:36.8588031Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml (deflated 85%) 2025-12-04T14:40:36.8589317Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml (deflated 85%) 2025-12-04T14:40:36.8590436Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml (deflated 87%) 2025-12-04T14:40:36.8591521Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml (deflated 85%) 2025-12-04T14:40:36.8592587Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml (deflated 85%) 2025-12-04T14:40:36.8593592Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml (deflated 87%) 2025-12-04T14:40:36.8594417Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml (deflated 85%) 2025-12-04T14:40:36.8595365Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml (deflated 85%) 2025-12-04T14:40:36.8596304Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml (deflated 86%) 2025-12-04T14:40:36.8597287Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml (deflated 85%) 2025-12-04T14:40:36.8598207Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml (deflated 85%) 2025-12-04T14:40:36.8599123Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml (deflated 86%) 2025-12-04T14:40:36.8600139Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml (deflated 85%) 2025-12-04T14:40:36.8601292Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml (deflated 85%) 2025-12-04T14:40:36.8602146Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml (deflated 86%) 2025-12-04T14:40:36.8603122Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml (deflated 85%) 2025-12-04T14:40:36.8604085Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml (deflated 85%) 2025-12-04T14:40:36.8604914Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml (deflated 86%) 2025-12-04T14:40:36.8605878Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml (deflated 85%) 2025-12-04T14:40:36.8606718Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml (deflated 85%) 2025-12-04T14:40:36.8607629Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml (deflated 86%) 2025-12-04T14:40:36.8608629Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml (deflated 85%) 2025-12-04T14:40:36.8609537Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml (deflated 85%) 2025-12-04T14:40:36.8610419Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml (deflated 86%) 2025-12-04T14:40:36.8611317Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml (deflated 85%) 2025-12-04T14:40:36.8612212Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml (deflated 85%) 2025-12-04T14:40:36.8613046Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml (deflated 86%) 2025-12-04T14:40:36.8614010Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml (deflated 85%) 2025-12-04T14:40:36.8614936Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml (deflated 85%) 2025-12-04T14:40:36.8615782Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml (deflated 86%) 2025-12-04T14:40:36.8616726Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml (deflated 85%) 2025-12-04T14:40:36.8617888Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml (deflated 85%) 2025-12-04T14:40:36.8618718Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml (deflated 86%) 2025-12-04T14:40:36.8619678Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml (deflated 85%) 2025-12-04T14:40:36.8620511Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml (deflated 85%) 2025-12-04T14:40:36.8621344Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml (deflated 86%) 2025-12-04T14:40:36.8622465Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml (deflated 85%) 2025-12-04T14:40:36.8623306Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml (deflated 85%) 2025-12-04T14:40:36.8624156Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml (deflated 86%) 2025-12-04T14:40:36.8625105Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml (deflated 85%) 2025-12-04T14:40:36.8626045Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml (deflated 85%) 2025-12-04T14:40:36.8626896Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml (deflated 86%) 2025-12-04T14:40:36.8627843Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml (deflated 85%) 2025-12-04T14:40:36.8628678Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml (deflated 85%) 2025-12-04T14:40:36.8629512Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml (deflated 86%) 2025-12-04T14:40:36.8630459Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml (deflated 85%) 2025-12-04T14:40:36.8631299Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml (deflated 85%) 2025-12-04T14:40:36.8632149Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml (deflated 86%) 2025-12-04T14:40:36.8633487Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml (deflated 85%) 2025-12-04T14:40:36.8634369Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml (deflated 85%) 2025-12-04T14:40:36.8635207Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml (deflated 86%) 2025-12-04T14:40:36.8636056Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml (deflated 85%) 2025-12-04T14:40:36.8637021Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml (deflated 85%) 2025-12-04T14:40:36.8637852Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml (deflated 86%) 2025-12-04T14:40:36.8638785Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml (deflated 85%) 2025-12-04T14:40:36.8639665Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml (deflated 85%) 2025-12-04T14:40:36.8640630Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml (deflated 87%) 2025-12-04T14:40:36.8641476Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml (deflated 85%) 2025-12-04T14:40:36.8642386Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml (deflated 85%) 2025-12-04T14:40:36.8643347Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml (deflated 86%) 2025-12-04T14:40:36.8644191Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml (deflated 85%) 2025-12-04T14:40:36.8645085Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml (deflated 85%) 2025-12-04T14:40:36.8646086Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml (deflated 86%) 2025-12-04T14:40:36.8647011Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml (deflated 85%) 2025-12-04T14:40:36.8648041Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml (deflated 85%) 2025-12-04T14:40:36.8648896Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml (deflated 86%) 2025-12-04T14:40:36.8649787Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml (deflated 85%) 2025-12-04T14:40:36.8650685Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml (deflated 85%) 2025-12-04T14:40:36.8651575Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml (deflated 86%) 2025-12-04T14:40:36.8652476Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml (deflated 85%) 2025-12-04T14:40:36.8653410Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml (deflated 85%) 2025-12-04T14:40:36.8654262Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml (deflated 86%) 2025-12-04T14:40:36.8655106Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml (deflated 85%) 2025-12-04T14:40:36.8656010Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml (deflated 85%) 2025-12-04T14:40:36.8656851Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml (deflated 87%) 2025-12-04T14:40:36.8657742Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml (deflated 85%) 2025-12-04T14:40:36.8658770Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml (deflated 85%) 2025-12-04T14:40:36.8659710Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml (deflated 86%) 2025-12-04T14:40:36.8660617Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml (deflated 85%) 2025-12-04T14:40:36.8661475Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml (deflated 85%) 2025-12-04T14:40:36.8662316Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml (deflated 86%) 2025-12-04T14:40:36.8663338Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml (deflated 85%) 2025-12-04T14:40:36.8664191Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml (deflated 85%) 2025-12-04T14:40:36.8665030Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml (deflated 87%) 2025-12-04T14:40:36.8665872Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml (deflated 85%) 2025-12-04T14:40:36.8666784Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml (deflated 85%) 2025-12-04T14:40:36.8667721Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml (deflated 86%) 2025-12-04T14:40:36.8668641Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml (deflated 85%) 2025-12-04T14:40:36.8669512Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml (deflated 85%) 2025-12-04T14:40:36.8670409Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml (deflated 86%) 2025-12-04T14:40:36.8671443Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml (deflated 85%) 2025-12-04T14:40:36.8672390Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml (deflated 85%) 2025-12-04T14:40:36.8673276Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml (deflated 86%) 2025-12-04T14:40:36.8674178Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml (deflated 85%) 2025-12-04T14:40:36.8675009Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml (deflated 85%) 2025-12-04T14:40:36.8675846Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml (deflated 86%) 2025-12-04T14:40:36.8676796Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml (deflated 85%) 2025-12-04T14:40:36.8677682Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml (deflated 85%) 2025-12-04T14:40:36.8678515Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml (deflated 86%) 2025-12-04T14:40:36.8679412Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml (deflated 85%) 2025-12-04T14:40:36.8680371Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml (deflated 85%) 2025-12-04T14:40:36.8681210Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml (deflated 87%) 2025-12-04T14:40:36.8682193Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml (deflated 85%) 2025-12-04T14:40:36.8683148Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml (deflated 85%) 2025-12-04T14:40:36.8684071Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml (deflated 86%) 2025-12-04T14:40:36.8684921Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml (deflated 85%) 2025-12-04T14:40:36.8685749Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml (deflated 85%) 2025-12-04T14:40:36.8686652Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml (deflated 86%) 2025-12-04T14:40:36.8687560Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml (deflated 85%) 2025-12-04T14:40:36.8688386Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml (deflated 85%) 2025-12-04T14:40:36.8689228Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml (deflated 87%) 2025-12-04T14:40:36.8690176Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml (deflated 85%) 2025-12-04T14:40:36.8691032Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml (deflated 85%) 2025-12-04T14:40:36.8691873Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml (deflated 86%) 2025-12-04T14:40:36.8692843Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml (deflated 85%) 2025-12-04T14:40:36.8693715Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml (deflated 85%) 2025-12-04T14:40:36.8694546Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml (deflated 86%) 2025-12-04T14:40:36.8695466Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml (deflated 85%) 2025-12-04T14:40:36.8696298Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml (deflated 85%) 2025-12-04T14:40:36.8697135Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml (deflated 86%) 2025-12-04T14:40:36.8698054Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml (deflated 85%) 2025-12-04T14:40:36.8698968Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml (deflated 85%) 2025-12-04T14:40:36.8699868Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml (deflated 86%) 2025-12-04T14:40:36.8700774Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml (deflated 85%) 2025-12-04T14:40:36.8701695Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml (deflated 85%) 2025-12-04T14:40:36.8702525Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml (deflated 86%) 2025-12-04T14:40:36.8703416Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml (deflated 85%) 2025-12-04T14:40:36.8704419Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml (deflated 85%) 2025-12-04T14:40:36.8705353Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml (deflated 86%) 2025-12-04T14:40:36.8706215Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml (deflated 85%) 2025-12-04T14:40:36.8707194Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml (deflated 85%) 2025-12-04T14:40:36.8708207Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml (deflated 86%) 2025-12-04T14:40:36.8709147Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml (deflated 85%) 2025-12-04T14:40:36.8710134Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml (deflated 85%) 2025-12-04T14:40:36.8710971Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml (deflated 86%) 2025-12-04T14:40:36.8711895Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml (deflated 85%) 2025-12-04T14:40:36.8712827Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml (deflated 85%) 2025-12-04T14:40:36.8713689Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml (deflated 86%) 2025-12-04T14:40:36.8714536Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml (deflated 85%) 2025-12-04T14:40:36.8715487Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml (deflated 85%) 2025-12-04T14:40:36.8716336Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml (deflated 86%) 2025-12-04T14:40:36.8717356Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml (deflated 85%) 2025-12-04T14:40:36.8718876Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml (deflated 85%) 2025-12-04T14:40:36.8719957Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml (deflated 86%) 2025-12-04T14:40:36.8720951Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml (deflated 85%) 2025-12-04T14:40:36.8721836Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml (deflated 85%) 2025-12-04T14:40:36.8722745Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml (deflated 86%) 2025-12-04T14:40:36.8723677Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml (deflated 85%) 2025-12-04T14:40:36.8724562Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml (deflated 85%) 2025-12-04T14:40:36.8725441Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml (deflated 86%) 2025-12-04T14:40:36.8726558Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml (deflated 85%) 2025-12-04T14:40:36.8727553Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml (deflated 85%) 2025-12-04T14:40:36.8728408Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml (deflated 86%) 2025-12-04T14:40:36.8729419Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml (deflated 85%) 2025-12-04T14:40:36.8730247Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml (deflated 85%) 2025-12-04T14:40:36.8731057Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml (deflated 86%) 2025-12-04T14:40:36.8731907Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml (deflated 85%) 2025-12-04T14:40:36.8732826Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml (deflated 85%) 2025-12-04T14:40:36.8733695Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml (deflated 87%) 2025-12-04T14:40:36.8734594Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml (deflated 85%) 2025-12-04T14:40:36.8735456Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml (deflated 85%) 2025-12-04T14:40:36.8736313Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml (deflated 86%) 2025-12-04T14:40:36.8737244Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml (deflated 85%) 2025-12-04T14:40:36.8738119Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml (deflated 85%) 2025-12-04T14:40:36.8739033Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml (deflated 87%) 2025-12-04T14:40:36.8739962Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml (deflated 85%) 2025-12-04T14:40:36.8740795Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml (deflated 85%) 2025-12-04T14:40:36.8741653Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml (deflated 86%) 2025-12-04T14:40:36.8742582Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml (deflated 85%) 2025-12-04T14:40:36.8743518Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml (deflated 85%) 2025-12-04T14:40:36.8744352Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml (deflated 28%) 2025-12-04T14:40:36.8785775Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-2cb94ab29d3c0df8.xml (deflated 96%) 2025-12-04T14:40:36.8791604Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-ab5aa4d4069f84fb.xml (deflated 91%) 2025-12-04T14:40:36.8797427Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-35f77a6004c12574.xml (deflated 91%) 2025-12-04T14:40:36.8803170Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-4e86df63b0f31fa2.xml (deflated 91%) 2025-12-04T14:40:36.8809200Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-421be54fd2475226.xml (deflated 91%) 2025-12-04T14:40:36.8815074Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-269f711d29febed7.xml (deflated 91%) 2025-12-04T14:40:36.8873651Z adding: test/test-reports/python-pytest/test_ops/test_ops-f35e359ea3f52347.xml (deflated 94%) 2025-12-04T14:40:36.8942041Z adding: test/test-reports/python-pytest/test_ops/test_ops-7327fc5de50caef8.xml (deflated 95%) 2025-12-04T14:40:36.8970663Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-e6d2768dce09d0dd.xml (deflated 93%) 2025-12-04T14:40:36.8974903Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-52326583abfcb307.xml (deflated 93%) 2025-12-04T14:40:36.8978996Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-b0e5cfa73b17bf79.xml (deflated 93%) 2025-12-04T14:40:36.8983119Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-44c334397fb0c3bd.xml (deflated 93%) 2025-12-04T14:40:36.8988833Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3098e6f6c63481df.xml (deflated 90%) 2025-12-04T14:40:36.9020132Z adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-d8fc516c8be54fc6.xml (deflated 92%) 2025-12-04T14:40:36.9021101Z adding: test/test-reports/python-pytest/inductor.test_layout_optim/inductor.test_layout_optim-ff0e0fc528f4f3dd.xml (deflated 27%) 2025-12-04T14:40:36.9035777Z adding: test/test-reports/python-pytest/dynamo.test_exc/dynamo.test_exc-59dcc92175511a1b.xml (deflated 95%) 2025-12-04T14:40:36.9042496Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_arrayref/inductor.test_aot_inductor_arrayref-c35059cecd7c3b99.xml (deflated 93%) 2025-12-04T14:40:36.9043536Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccee7b90c33901e0.xml (deflated 86%) 2025-12-04T14:40:36.9044501Z adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-4527efee43b2418d.xml (deflated 68%) 2025-12-04T14:40:36.9045536Z adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-1bfa13dfa66ba37a.xml (deflated 68%) 2025-12-04T14:40:36.9046551Z adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-d291715b5fb08603.xml (deflated 83%) 2025-12-04T14:40:36.9047498Z adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-8f5f76cf24e3a322.xml (deflated 35%) 2025-12-04T14:40:36.9048378Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml (deflated 92%) 2025-12-04T14:40:36.9049161Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml (deflated 85%) 2025-12-04T14:40:36.9049943Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml (deflated 85%) 2025-12-04T14:40:36.9052441Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml (deflated 94%) 2025-12-04T14:40:36.9053238Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml (deflated 72%) 2025-12-04T14:40:36.9059560Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml (deflated 97%) 2025-12-04T14:40:36.9061061Z adding: test/test-reports/python-pytest/inductor.test_flex_flash/inductor.test_flex_flash-594b02277dbafddb.xml (deflated 96%) 2025-12-04T14:40:36.9062170Z adding: test/test-reports/python-pytest/inductor.test_segmented_tree/inductor.test_segmented_tree-3edcd06141f9e439.xml (deflated 79%) 2025-12-04T14:40:36.9063206Z adding: test/test-reports/python-pytest/inductor.test_kernel_optimization/inductor.test_kernel_optimization-b711ca038cf9dded.xml (deflated 38%) 2025-12-04T14:40:36.9064006Z adding: test/test-reports/python-pytest/inductor.test_metrics/inductor.test_metrics-b803ef0c4a9491e7.xml (deflated 69%) 2025-12-04T14:40:36.9064758Z adding: test/test-reports/python-pytest/export.test_unflatten_training_ir/export.test_unflatten_training_ir-0b06d16f89271e02.xml (deflated 92%) 2025-12-04T14:40:36.9078666Z adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-31757cb9ac5c1c41.xml (deflated 94%) 2025-12-04T14:40:36.9079649Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-0fae1aaa4a2003eb.xml (deflated 88%) 2025-12-04T14:40:36.9080715Z adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-bc092756daa3cb92.xml (deflated 80%) 2025-12-04T14:40:36.9151443Z adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-707c3510a208c1b4.xml (deflated 95%) 2025-12-04T14:40:36.9152312Z adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-c5f7586c63bf6a73.xml (deflated 46%) 2025-12-04T14:40:36.9153241Z adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-a0634e57d5b5356a.xml (deflated 87%) 2025-12-04T14:40:36.9154194Z adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-31b2988c053252a6.xml (deflated 62%) 2025-12-04T14:40:36.9155132Z adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-84688fa1768a881f.xml (deflated 47%) 2025-12-04T14:40:36.9156046Z adding: test/test-reports/python-pytest/inductor.test_best_config/inductor.test_best_config-423c961bf0014ead.xml (deflated 51%) 2025-12-04T14:40:36.9156887Z adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-21258e52645f11ac.xml (deflated 47%) 2025-12-04T14:40:36.9167377Z adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-bb39f909a1ebabd7.xml (deflated 96%) 2025-12-04T14:40:36.9169246Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_custom_ops/inductor.test_aot_inductor_custom_ops-a7a529277f9f9a31.xml (deflated 93%) 2025-12-04T14:40:36.9181157Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-955a1f614439a238.xml (deflated 96%) 2025-12-04T14:40:36.9182391Z adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-e1e65156309c3950.xml (deflated 85%) 2025-12-04T14:40:36.9183266Z adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e4d33245d4b5500b.xml (deflated 89%) 2025-12-04T14:40:36.9185294Z adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-212782fb09fba1b6.xml (deflated 86%) 2025-12-04T14:40:36.9186106Z adding: test/test-reports/python-pytest/inductor.test_needs_exact_strides/inductor.test_needs_exact_strides-9df9f4b52deeb26d.xml (deflated 64%) 2025-12-04T14:40:36.9198146Z adding: test/test-reports/python-pytest/inductor.test_auto_functionalize/inductor.test_auto_functionalize-2721b330ad87dcbb.xml (deflated 95%) 2025-12-04T14:40:36.9199054Z adding: test/test-reports/python-pytest/dynamo.test_modes/dynamo.test_modes-84ffae5b03f7e325.xml (deflated 75%) 2025-12-04T14:40:36.9200031Z adding: test/test-reports/python-pytest/inductor.test_custom_partitioner_fn/inductor.test_custom_partitioner_fn-7cdf0df133f39710.xml (deflated 49%) 2025-12-04T14:40:36.9200980Z adding: test/test-reports/python-pytest/dynamo.test_debug_utils/dynamo.test_debug_utils-34b85abff78e1075.xml (deflated 62%) 2025-12-04T14:40:36.9201993Z adding: test/test-reports/python-pytest/dynamo.test_base_hop/dynamo.test_base_hop-f16fc4276458e68a.xml (deflated 74%) 2025-12-04T14:40:36.9208320Z adding: test/test-reports/python-pytest/dynamo.test_export/dynamo.test_export-b39b0ef66a188fdf.xml (deflated 87%) 2025-12-04T14:40:36.9209030Z adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-89e5f4289c609732.xml (deflated 77%) 2025-12-04T14:40:36.9209832Z adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-4cc8060b16634bb1.xml (deflated 93%) 2025-12-04T14:40:36.9211296Z adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-9dcd937885307cef.xml (deflated 92%) 2025-12-04T14:40:36.9212033Z adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-22b40053bc190597.xml (deflated 64%) 2025-12-04T14:40:36.9213424Z adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-e50f738759450405.xml (deflated 83%) 2025-12-04T14:40:36.9214805Z adding: test/test-reports/python-pytest/dynamo.test_cudagraphs_expandable_segments/dynamo.test_cudagraphs_expandable_segments-6088bb8977cfc034.xml (deflated 85%) 2025-12-04T14:40:36.9217230Z adding: test/test-reports/python-pytest/inductor.test_caching/inductor.test_caching-81c36c30e8c9f16c.xml (deflated 95%) 2025-12-04T14:40:36.9218555Z adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-8774cf5ade30b7b9.xml (deflated 85%) 2025-12-04T14:40:36.9222315Z adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-debad3a483737c49.xml (deflated 90%) 2025-12-04T14:40:36.9223304Z adding: test/test-reports/python-pytest/dynamo.test_comptime/dynamo.test_comptime-47f6d1d4947f2a1a.xml (deflated 80%) 2025-12-04T14:40:36.9224237Z adding: test/test-reports/python-pytest/test_privateuseone_python_backend/test_privateuseone_python_backend-af92c89cf1e734fe.xml (deflated 51%) 2025-12-04T14:40:36.9225219Z adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-2881d42f49f0d5f4.xml (deflated 77%) 2025-12-04T14:40:36.9226106Z adding: test/test-reports/python-pytest/functorch.test_parsing/functorch.test_parsing-b19da695e97869b8.xml (deflated 77%) 2025-12-04T14:40:36.9226961Z adding: test/test-reports/python-pytest/test_varlen_attention/test_varlen_attention-c77e9d85fce2d7de.xml (deflated 83%) 2025-12-04T14:40:36.9227764Z adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-aebaabaedf0418a4.xml (deflated 50%) 2025-12-04T14:40:36.9231557Z adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-b67630ba146aa7a3.xml (deflated 94%) 2025-12-04T14:40:36.9232323Z adding: test/test-reports/python-pytest/test_autoload/test_autoload-f8ddaf02f0fba12a.xml (deflated 38%) 2025-12-04T14:40:36.9233178Z adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-090bcad7e8d69cda.xml (deflated 51%) 2025-12-04T14:40:36.9233951Z adding: test/test-reports/python-pytest/xpu.test_fusion/xpu.test_fusion-5eb21ed31dcf28c7.xml (deflated 27%) 2025-12-04T14:40:36.9271245Z adding: test/test-reports/python-pytest/test_foreach/test_foreach-08a78fa61936b219.xml (deflated 96%) 2025-12-04T14:40:36.9272386Z adding: test/test-reports/python-pytest/test_pytree/test_pytree-863a0f55639901d8.xml (deflated 92%) 2025-12-04T14:40:36.9273189Z adding: test/test-reports/python-pytest/test_namedtuple_return_api/test_namedtuple_return_api-8cf83ed9877ffdac.xml (deflated 72%) 2025-12-04T14:40:36.9274159Z adding: test/test-reports/python-pytest/profiler.test_record_function/profiler.test_record_function-709377f3af2db71e.xml (deflated 74%) 2025-12-04T14:40:36.9275103Z adding: test/test-reports/python-pytest/test_compile_benchmark_util/test_compile_benchmark_util-827ec108afd6e51f.xml (deflated 85%) 2025-12-04T14:40:36.9276255Z adding: test/test-reports/python-pytest/test_set_default_mobile_cpu_allocator/test_set_default_mobile_cpu_allocator-a62b10a07a12c95d.xml (deflated 52%) 2025-12-04T14:40:36.9279690Z adding: test/test-reports/python-pytest/test_fake_tensor/test_fake_tensor-aa30317e19bcd391.xml (deflated 90%) 2025-12-04T14:40:36.9396414Z adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-6264828c6395375f.xml (deflated 97%) 2025-12-04T14:40:36.9548755Z adding: test/test-reports/python-pytest/test_meta/test_meta-be563441d2ad6907.xml (deflated 96%) 2025-12-04T14:40:36.9573048Z adding: test/test-reports/python-pytest/test_fx/test_fx-26e481d16f3bec04.xml (deflated 95%) 2025-12-04T14:40:36.9599858Z adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-e4282282966a7ff9.xml (deflated 96%) 2025-12-04T14:40:36.9624197Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-59ab3583fe1f80dc.xml (deflated 98%) 2025-12-04T14:40:36.9634303Z adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-5ef9f5430d478fb5.xml (deflated 94%) 2025-12-04T14:40:36.9636845Z adding: test/test-reports/python-pytest/complex_tensor.test_complex_tensor/complex_tensor.test_complex_tensor-b8215f419723e2db.xml (deflated 94%) 2025-12-04T14:40:36.9637986Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bedc5291ea06d7d4.xml (deflated 95%) 2025-12-04T14:40:36.9660370Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-7c0d16c38d6c5d66.xml (deflated 93%) 2025-12-04T14:40:36.9681926Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-53045e36c53c5e0b.xml (deflated 93%) 2025-12-04T14:40:36.9682890Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-3f363f609079497e.xml (deflated 86%) 2025-12-04T14:40:36.9686419Z adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-84ccb0941d397f92.xml (deflated 96%) 2025-12-04T14:40:36.9689731Z adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-7140a1ac93a67fd6.xml (deflated 92%) 2025-12-04T14:40:36.9735467Z adding: test/test-reports/python-pytest/test_nn/test_nn-9b13ea9411b68db1.xml (deflated 97%) 2025-12-04T14:40:36.9736598Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-6bd6911b2987aa11.xml (deflated 91%) 2025-12-04T14:40:36.9737627Z adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-a884db72e4413a95.xml (deflated 86%) 2025-12-04T14:40:36.9740100Z adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-21d4709389e5b8ee.xml (deflated 92%) 2025-12-04T14:40:36.9742506Z adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-ee3bcbb253c53d76.xml (deflated 95%) 2025-12-04T14:40:36.9743359Z adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-f847b013a309e4b4.xml (deflated 81%) 2025-12-04T14:40:36.9744053Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d415fb392a79d6b5.xml (deflated 37%) 2025-12-04T14:40:36.9744645Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-020323916fe208b4.xml (deflated 35%) 2025-12-04T14:40:36.9745241Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-ec2ab5853a8f5bb5.xml (deflated 35%) 2025-12-04T14:40:36.9745840Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2344770aa1066fe.xml (deflated 35%) 2025-12-04T14:40:36.9746428Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7974fe8ff8b4bbf1.xml (deflated 35%) 2025-12-04T14:40:36.9747011Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a89ca46644423e33.xml (deflated 35%) 2025-12-04T14:40:36.9747595Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a24d5339b3f8a425.xml (deflated 35%) 2025-12-04T14:40:36.9748328Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-e1f683ecde7e9859.xml (deflated 36%) 2025-12-04T14:40:36.9748937Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-18c9d7ef1c10ca2f.xml (deflated 35%) 2025-12-04T14:40:36.9749524Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c34005545b418b69.xml (deflated 35%) 2025-12-04T14:40:36.9750105Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-fef6b14731bb5c3b.xml (deflated 36%) 2025-12-04T14:40:36.9750795Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-9f269287574f2f5d.xml (deflated 35%) 2025-12-04T14:40:36.9751380Z adding: test/test-reports/python-pytest/test_native_mha/test_native_mha-92c62998ce15c273.xml (deflated 93%) 2025-12-04T14:40:36.9752143Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numerictypes/torch_np.numpy_tests.core.test_numerictypes-794afd975a25dd08.xml (deflated 90%) 2025-12-04T14:40:36.9752962Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-264e490a42609567.xml (deflated 37%) 2025-12-04T14:40:36.9753674Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3424f861fbb3f30b.xml (deflated 36%) 2025-12-04T14:40:36.9754375Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-34f31a11d054a5e2.xml (deflated 37%) 2025-12-04T14:40:36.9755073Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-5cc2b0915fc09b1e.xml (deflated 36%) 2025-12-04T14:40:36.9755768Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8b636380fe175cf1.xml (deflated 36%) 2025-12-04T14:40:36.9756466Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3e5b65c914a199ce.xml (deflated 37%) 2025-12-04T14:40:36.9757159Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-df0d139575237859.xml (deflated 34%) 2025-12-04T14:40:36.9757875Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3264b6f40acb9dc5.xml (deflated 36%) 2025-12-04T14:40:36.9758569Z adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8f705997ba04d72c.xml (deflated 34%) 2025-12-04T14:40:36.9759242Z adding: test/test-reports/python-pytest/test_function_schema/test_function_schema-23d7f60e4862c430.xml (deflated 82%) 2025-12-04T14:40:36.9759878Z adding: test/test-reports/python-pytest/test_accelerator/test_accelerator-96bd402fc51988fa.xml (deflated 80%) 2025-12-04T14:40:36.9760574Z adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-4d2f3b79bbd3891e.xml (deflated 83%) 2025-12-04T14:40:36.9761326Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_methods/torch_np.numpy_tests.core.test_scalar_methods-ab020ac5345dfbce.xml (deflated 96%) 2025-12-04T14:40:36.9762209Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_helper/torch_np.numpy_tests.fft.test_helper-f56f2bb61bb011d4.xml (deflated 74%) 2025-12-04T14:40:36.9762962Z adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-6eaefe04adaaf056.xml (deflated 80%) 2025-12-04T14:40:36.9770791Z adding: test/test-reports/python-pytest/test_overrides/test_overrides-54542bcdcb986158.xml (deflated 95%) 2025-12-04T14:40:36.9771511Z adding: test/test-reports/python-pytest/torch_np.test_function_base/torch_np.test_function_base-75280ab4c9ebfe9c.xml (deflated 47%) 2025-12-04T14:40:36.9776631Z adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-263da3bcb01bdba5.xml (deflated 96%) 2025-12-04T14:40:36.9777362Z adding: test/test-reports/python-pytest/torch_np.test_scalars_0D_arrays/torch_np.test_scalars_0D_arrays-0d870852e9b8ecce.xml (deflated 91%) 2025-12-04T14:40:36.9778166Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-ed390b613a87fd4b.xml (deflated 43%) 2025-12-04T14:40:36.9778839Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-30e4b3748ee506f0.xml (deflated 42%) 2025-12-04T14:40:36.9779494Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-a99d9031384e91f6.xml (deflated 36%) 2025-12-04T14:40:36.9780149Z adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-f9602eb3757a7925.xml (deflated 42%) 2025-12-04T14:40:36.9780916Z adding: test/test-reports/python-pytest/profiler.test_profiler_tree/profiler.test_profiler_tree-3ad20a65a3b0a20b.xml (deflated 82%) 2025-12-04T14:40:36.9781740Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraysetops/torch_np.numpy_tests.lib.test_arraysetops-0f1d2c69ab44ce0e.xml (deflated 95%) 2025-12-04T14:40:36.9782728Z adding: test/test-reports/python-pytest/test_dlpack/test_dlpack-41fa7f929a572602.xml (deflated 94%) 2025-12-04T14:40:36.9783599Z adding: test/test-reports/python-pytest/profiler.test_torch_tidy/profiler.test_torch_tidy-6d6836cfdd083f06.xml (deflated 76%) 2025-12-04T14:40:36.9784277Z adding: test/test-reports/python-pytest/lazy.test_reuse_ir/lazy.test_reuse_ir-927d5809a72b7fa4.xml (deflated 62%) 2025-12-04T14:40:36.9785006Z adding: test/test-reports/python-pytest/test_functional_autograd_benchmark/test_functional_autograd_benchmark-0542bd395ed50334.xml (deflated 54%) 2025-12-04T14:40:36.9828935Z adding: test/test-reports/python-pytest/test_reductions/test_reductions-113ccd6215a90199.xml (deflated 96%) 2025-12-04T14:40:36.9829769Z adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204144025.xml (deflated 43%) 2025-12-04T14:40:36.9868889Z ##[group]Run # Remove any previous usage logs if they exist 2025-12-04T14:40:36.9869227Z # Remove any previous usage logs if they exist 2025-12-04T14:40:36.9869488Z rm -f logs-*.zip 2025-12-04T14:40:36.9869741Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-12-04T14:40:36.9870102Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-12-04T14:40:36.9877296Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:36.9877584Z env: 2025-12-04T14:40:36.9877738Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:36.9877930Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:36.9878173Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:36.9878568Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:36.9879070Z FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T14:40:36.9879423Z ##[endgroup] 2025-12-04T14:40:36.9958330Z adding: usage_log.txt (deflated 58%) 2025-12-04T14:40:37.0028670Z adding: test/test-reports/inductor.test_aot_inductor_2.4_15a925ff16cb0669_.log (deflated 91%) 2025-12-04T14:40:37.0037365Z adding: test/test-reports/dynamo.test_repros_1.1_21fd2d3c0d4dd552_.log (deflated 85%) 2025-12-04T14:40:37.0041567Z adding: test/test-reports/inductor.test_flex_attention_2.6_90be3f66c016358d_.log (deflated 90%) 2025-12-04T14:40:37.0172892Z adding: test/test-reports/inductor.test_cuda_select_algorithm_1.1_c5144f504c6801ae_.log (deflated 97%) 2025-12-04T14:40:37.0221042Z adding: test/test-reports/inductor.test_compile_subprocess_2.2_b00c67e905398654_.log (deflated 95%) 2025-12-04T14:40:37.0230928Z adding: test/test-reports/test_decomp_1.22_14e10bdd16255327_.log (deflated 89%) 2025-12-04T14:40:37.0240757Z adding: test/test-reports/test_decomp_5.22_0df19fb5b56a60f0_.log (deflated 88%) 2025-12-04T14:40:37.0249956Z adding: test/test-reports/test_decomp_10.22_12f8ad2c135e2bbb_.log (deflated 88%) 2025-12-04T14:40:37.0260064Z adding: test/test-reports/test_decomp_15.22_3612d41b87c57e18_.log (deflated 88%) 2025-12-04T14:40:37.0269686Z adding: test/test-reports/test_decomp_20.22_0285151ec6a3cff1_.log (deflated 88%) 2025-12-04T14:40:37.0350109Z adding: test/test-reports/test_ops_3.9_cd041df637bc19f1_.log (deflated 92%) 2025-12-04T14:40:37.0429668Z adding: test/test-reports/test_ops_8.9_44eee4e8e92c270e_.log (deflated 92%) 2025-12-04T14:40:37.0430302Z adding: test/test-reports/test_cuda_primary_ctx_1.1_e42f702cb40c5a59_.log (deflated 85%) 2025-12-04T14:40:37.0446965Z adding: test/test-reports/inductor.test_torchinductor_dynamic_shapes_4.4_d7a417aa701cd416_.log (deflated 92%) 2025-12-04T14:40:37.0456406Z adding: test/test-reports/inductor.test_torchinductor_opinfo_2.13_c074395000e8f728_.log (deflated 92%) 2025-12-04T14:40:37.0465143Z adding: test/test-reports/inductor.test_torchinductor_opinfo_7.13_206d55120439a46b_.log (deflated 93%) 2025-12-04T14:40:37.0465859Z adding: test/test-reports/profiler.test_profiler_tree_1.1_8c9f11eb1f1e482c_.log (deflated 77%) 2025-12-04T14:40:37.0473382Z adding: test/test-reports/inductor.test_torchinductor_opinfo_12.13_b0b968134062f752_.log (deflated 91%) 2025-12-04T14:40:37.0475866Z adding: test/test-reports/inductor.test_cuda_repro_1.1_1c12e5d6528c7a17_.log (deflated 84%) 2025-12-04T14:40:37.0490439Z adding: test/test-reports/inductor.test_compiled_autograd_1.2_dcced55d4b6d289b_.log (deflated 90%) 2025-12-04T14:40:37.0491110Z adding: test/test-reports/inductor.test_layout_optim_1.1_85312b2aa31c9171_.log (deflated 50%) 2025-12-04T14:40:37.0491710Z adding: test/test-reports/dynamo.test_exc_1.1_e6b03d521cd15643_.log (deflated 72%) 2025-12-04T14:40:37.0498829Z adding: test/test-reports/inductor.test_aot_inductor_arrayref_1.2_f8b4577ce160ed2e_.log (deflated 90%) 2025-12-04T14:40:37.0499361Z adding: test/test-reports/inductor.test_halide_1.1_2220e392a986bf8c_.log (deflated 8%) 2025-12-04T14:40:37.0500055Z adding: test/test-reports/inductor.test_deterministic_1.3_31baf2894918a3e4_.log (deflated 81%) 2025-12-04T14:40:37.0500755Z adding: test/test-reports/dynamo.test_deque_reconstruct_1.1_5be42757c28bbc33_.log (deflated 63%) 2025-12-04T14:40:37.0501443Z adding: test/test-reports/inductor.test_inductor_annotations_1.1_acd21ad590bd7056_.log (deflated 59%) 2025-12-04T14:40:37.0502133Z adding: test/test-reports/inductor.test_compile_worker_1.1_22d83b26da38be9a_.log (deflated 76%) 2025-12-04T14:40:37.0502814Z adding: test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_2cb42dcf29e7ede2_.log (deflated 53%) 2025-12-04T14:40:37.0514825Z adding: test/test-reports/inductor.test_fp8_1.1_041887d0b8d7fee8_.log (deflated 94%) 2025-12-04T14:40:37.0516296Z adding: test/test-reports/inductor.test_flex_flash_1.1_143acf3e8eed598e_.log (deflated 92%) 2025-12-04T14:40:37.0517288Z adding: test/test-reports/inductor.test_segmented_tree_1.1_84df512657bb7938_.log (deflated 74%) 2025-12-04T14:40:37.0518273Z adding: test/test-reports/inductor.test_kernel_optimization_1.1_55e099cc3f4c3e00_.log (deflated 54%) 2025-12-04T14:40:37.0518942Z adding: test/test-reports/inductor.test_metrics_1.1_8516a4d7ea2a79fb_.log (deflated 64%) 2025-12-04T14:40:37.0520275Z adding: test/test-reports/export.test_unflatten_training_ir_1.1_381cd1c16fe9e11b_.log (deflated 85%) 2025-12-04T14:40:37.0531134Z adding: test/test-reports/inductor.test_triton_kernels_1.1_cd88835deb58b6d4_.log (deflated 92%) 2025-12-04T14:40:37.0531669Z adding: test/test-reports/inductor.test_lookup_table_1.1_5735430f25b64d36_.log (deflated 6%) 2025-12-04T14:40:37.0532202Z adding: test/test-reports/inductor.test_cutedsl_template_1.1_aa2751f33400ad4f_.log (deflated 77%) 2025-12-04T14:40:37.0532958Z adding: test/test-reports/inductor.test_benchmark_fusion_1.1_e074574f7d298815_.log (deflated 75%) 2025-12-04T14:40:37.0569889Z adding: test/test-reports/export.test_serdes_1.1_e8663fe68509169e_.log (deflated 91%) 2025-12-04T14:40:37.0570530Z adding: test/test-reports/inductor.test_control_deps_1.1_8b7ff51ad1c7850a_.log (deflated 51%) 2025-12-04T14:40:37.0571171Z adding: test/test-reports/inductor.test_benchmarking_1.1_829d05e5f8b311c4_.log (deflated 79%) 2025-12-04T14:40:37.0572004Z adding: test/test-reports/inductor.test_helion_kernels_1.1_570e1b539a64331c_.log (deflated 57%) 2025-12-04T14:40:37.0572666Z adding: test/test-reports/inductor.test_quantization_1.1_121915a11d71492f_.log (deflated 56%) 2025-12-04T14:40:37.0573305Z adding: test/test-reports/inductor.test_best_config_1.1_4e63c9241f189413_.log (deflated 53%) 2025-12-04T14:40:37.0573796Z adding: test/test-reports/export.test_tools_1.1_3c90f78b66d684d8_.log (deflated 63%) 2025-12-04T14:40:37.0580283Z adding: test/test-reports/inductor.test_compiled_optimizers_1.3_97f7ba3c63654c1d_.log (deflated 92%) 2025-12-04T14:40:37.0581893Z adding: test/test-reports/torch_np.numpy_tests.lib.test_arraysetops_1.1_330f28fa6e8cce7d_.log (deflated 89%) 2025-12-04T14:40:37.0583445Z adding: test/test-reports/inductor.test_aot_inductor_custom_ops_1.1_dddd6a5d20b7cc0a_.log (deflated 88%) 2025-12-04T14:40:37.0975805Z adding: test/test-reports/inductor.test_control_flow_4.5_21e8d4f49de459e9_.log (deflated 97%) 2025-12-04T14:40:37.0976464Z adding: test/test-reports/dynamo.test_cudagraphs_1.1_90173d668b3e025f_.log (deflated 68%) 2025-12-04T14:40:37.0977076Z adding: test/test-reports/inductor.test_alignment_1.1_a164af8fc87f74b5_.log (deflated 73%) 2025-12-04T14:40:37.0978954Z adding: test/test-reports/dynamo.test_guard_serialization_1.1_1b7b87cb0989e9eb_.log (deflated 84%) 2025-12-04T14:40:37.0979634Z adding: test/test-reports/inductor.test_needs_exact_strides_1.1_5109866c57595440_.log (deflated 58%) 2025-12-04T14:40:37.0980488Z adding: test/test-reports/inductor.test_auto_functionalize_1.1_68ce77f985508c50_.log (deflated 85%) 2025-12-04T14:40:37.0981470Z adding: test/test-reports/dynamo.test_modes_1.1_8c4add47a33e36b7_.log (deflated 81%) 2025-12-04T14:40:37.0982122Z adding: test/test-reports/inductor.test_custom_partitioner_fn_1.1_c3b6cd6a5d7fdcc1_.log (deflated 55%) 2025-12-04T14:40:37.0982788Z adding: test/test-reports/dynamo.test_debug_utils_1.1_95c2c78decfe726b_.log (deflated 62%) 2025-12-04T14:40:37.0983393Z adding: test/test-reports/dynamo.test_base_hop_1.1_b2087f66aa3d673c_.log (deflated 71%) 2025-12-04T14:40:37.0993356Z adding: test/test-reports/dynamo.test_export_1.1_b7c3be727fa89598_.log (deflated 91%) 2025-12-04T14:40:37.0993975Z adding: test/test-reports/dynamo.test_python_dispatcher_1.1_786a010f0f7c3940_.log (deflated 69%) 2025-12-04T14:40:37.0994585Z adding: test/test-reports/export.test_swap_1.1_95c916183e2c0305_.log (deflated 78%) 2025-12-04T14:40:37.0995746Z adding: test/test-reports/export.test_unflatten_1.1_d630949213564262_.log (deflated 78%) 2025-12-04T14:40:37.0996398Z adding: test/test-reports/dynamo.test_verify_correctness_1.1_d2d881eebc4cfc16_.log (deflated 67%) 2025-12-04T14:40:37.0999491Z adding: test/test-reports/test_dlpack_1.1_1ec0373c5e8d3363_.log (deflated 91%) 2025-12-04T14:40:37.1000576Z adding: test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_3fbc3d993fa8b554_.log (deflated 81%) 2025-12-04T14:40:37.1001435Z adding: test/test-reports/profiler.test_torch_tidy_1.1_59c0964fd7f7a67f_.log (deflated 76%) 2025-12-04T14:40:37.1002173Z adding: test/test-reports/dynamo.test_cudagraphs_expandable_segments_1.1_461bf64d7b370157_.log (deflated 72%) 2025-12-04T14:40:37.1006420Z adding: test/test-reports/inductor.test_caching_1.1_ee4444a2502ab159_.log (deflated 93%) 2025-12-04T14:40:37.1007134Z adding: test/test-reports/dynamo.test_reorder_logs_1.1_bce7080c1339fb4f_.log (deflated 78%) 2025-12-04T14:40:37.1012167Z adding: test/test-reports/dynamo.test_subclasses_1.1_0f5f75f18480ff60_.log (deflated 89%) 2025-12-04T14:40:37.1012712Z adding: test/test-reports/dynamo.test_comptime_1.1_ee7f00d5f391ca6c_.log (deflated 71%) 2025-12-04T14:40:37.1013251Z adding: test/test-reports/test_privateuseone_python_backend_1.1_d1fcfb50f1d5d34a_.log (deflated 58%) 2025-12-04T14:40:37.1013918Z adding: test/test-reports/functorch.test_rearrange_1.1_f58f64fb207f9ea4_.log (deflated 71%) 2025-12-04T14:40:37.1014543Z adding: test/test-reports/functorch.test_parsing_1.1_ab4f4097a5615c37_.log (deflated 73%) 2025-12-04T14:40:37.1015547Z adding: test/test-reports/test_varlen_attention_1.1_7b4553019fa7c3b1_.log (deflated 85%) 2025-12-04T14:40:37.1016130Z adding: test/test-reports/test_mkl_verbose_1.1_8c6cbee907023829_.log (deflated 54%) 2025-12-04T14:40:37.1027808Z adding: test/test-reports/test_cpp_api_parity_1.1_c465bf1c09b2866a_.log (deflated 94%) 2025-12-04T14:40:37.1028387Z adding: test/test-reports/test_autoload_1.1_7ada199d937afcb8_.log (deflated 50%) 2025-12-04T14:40:37.1028992Z adding: test/test-reports/nn.attention.test_open_registry_1.1_7d49315786a6f063_.log (deflated 58%) 2025-12-04T14:40:37.1029780Z adding: test/test-reports/xpu.test_fusion_1.1_008b7f4febfa87c6_.log (deflated 48%) 2025-12-04T14:40:37.1101857Z adding: test/test-reports/test_foreach_1.1_76ff14b11afb6f5e_.log (deflated 95%) 2025-12-04T14:40:37.1104057Z adding: test/test-reports/test_pytree_1.1_d855357a70a16b8f_.log (deflated 87%) 2025-12-04T14:40:37.1104651Z adding: test/test-reports/test_namedtuple_return_api_1.1_7b0afa1d6fad9127_.log (deflated 60%) 2025-12-04T14:40:37.1105311Z adding: test/test-reports/profiler.test_record_function_1.1_27a3a5f054eb4856_.log (deflated 70%) 2025-12-04T14:40:37.1105970Z adding: test/test-reports/test_compile_benchmark_util_1.1_37f34d6b957df075_.log (deflated 53%) 2025-12-04T14:40:37.1106573Z adding: test/test-reports/lazy.test_reuse_ir_1.1_27a380ba8ffdf251_.log (deflated 59%) 2025-12-04T14:40:37.1107205Z adding: test/test-reports/test_set_default_mobile_cpu_allocator_1.1_d39ef648b820daa5_.log (deflated 59%) 2025-12-04T14:40:37.1113916Z adding: test/test-reports/test_fake_tensor_1.1_e61dd7d666716729_.log (deflated 90%) 2025-12-04T14:40:37.1344171Z adding: test/test-reports/test_binary_ufuncs_1.1_62e3371632b5f5b8_.log (deflated 96%) 2025-12-04T14:40:37.1549907Z adding: test/test-reports/test_meta_2.4_5af3c72f6896ae42_.log (deflated 94%) 2025-12-04T14:40:37.1576795Z adding: test/test-reports/test_fx_1.1_c9e7dc6459df6851_.log (deflated 92%) 2025-12-04T14:40:37.1607220Z adding: test/test-reports/test_ops_gradients_2.4_85b1c0ac8f503e20_.log (deflated 93%) 2025-12-04T14:40:37.1619011Z adding: test/test-reports/test_nestedtensor_3.4_dff7d83ca5d5d6c7_.log (deflated 91%) 2025-12-04T14:40:37.1787592Z adding: test/test-reports/functorch.test_control_flow_4.4_38ed588b114098c9_.log (deflated 96%) 2025-12-04T14:40:37.1792634Z adding: test/test-reports/complex_tensor.test_complex_tensor_3.3_18c49b70545b8444_.log (deflated 93%) 2025-12-04T14:40:37.1793324Z adding: test/test-reports/optim.test_optim_1.1_96914e6399c32275_.log (deflated 7%) 2025-12-04T14:40:37.1794148Z adding: test/test-reports/test_functional_autograd_benchmark_1.1_a799b1a1c12db472_.log (deflated 87%) 2025-12-04T14:40:37.1796355Z adding: test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_de16653e9fff9a91_.log (deflated 90%) 2025-12-04T14:40:37.1821653Z adding: test/test-reports/functorch.test_ops_1.9_c0eee613fccab590_.log (deflated 91%) 2025-12-04T14:40:37.1845930Z adding: test/test-reports/functorch.test_ops_6.9_28c20a9b783bf56f_.log (deflated 91%) 2025-12-04T14:40:37.1846740Z adding: test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_82959dff790c271d_.log (deflated 77%) 2025-12-04T14:40:37.1853297Z adding: test/test-reports/torch_np.test_ndarray_methods_1.1_6320f1f29b59eecd_.log (deflated 94%) 2025-12-04T14:40:37.1858481Z adding: test/test-reports/test_view_ops_1.1_cb0863e2f971ce65_.log (deflated 91%) 2025-12-04T14:40:37.1917446Z adding: test/test-reports/test_nn_1.1_a9c5853bd400a590_.log (deflated 95%) 2025-12-04T14:40:37.1995958Z adding: test/test-reports/test_reductions_1.1_99f27656cb9f6489_.log (deflated 96%) 2025-12-04T14:40:37.1997175Z adding: test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a3265ffbea1c2963_.log (deflated 85%) 2025-12-04T14:40:37.1998648Z adding: test/test-reports/test_jit_autocast_1.1_125de5112eb33739_.log (deflated 81%) 2025-12-04T14:40:37.2002462Z adding: test/test-reports/nn.test_pooling_1.1_2fc4f4b87243805c_.log (deflated 90%) 2025-12-04T14:40:37.2006459Z adding: test/test-reports/nn.test_embedding_1.1_0d2f816320a0d140_.log (deflated 93%) 2025-12-04T14:40:37.2007083Z adding: test/test-reports/test_xnnpack_integration_1.1_cee22bfc3e04069b_.log (deflated 72%) 2025-12-04T14:40:37.2008197Z adding: test/test-reports/test_cuda_trace_1.1_a09bfc4c91236df9_.log (deflated 92%) 2025-12-04T14:40:37.2010278Z adding: test/test-reports/test_native_mha_1.1_0baa40d098e975c1_.log (deflated 93%) 2025-12-04T14:40:37.2011296Z adding: test/test-reports/torch_np.numpy_tests.core.test_numerictypes_1.1_2b678d8fcdaea099_.log (deflated 86%) 2025-12-04T14:40:37.2012427Z adding: test/test-reports/test_cuda_nvml_based_avail_1.1_f1988982a9fd6374_.log (deflated 92%) 2025-12-04T14:40:37.2013119Z adding: test/test-reports/test_function_schema_1.1_77625ad433e579dc_.log (deflated 77%) 2025-12-04T14:40:37.2013764Z adding: test/test-reports/test_accelerator_1.1_df51eee5e1970e59_.log (deflated 72%) 2025-12-04T14:40:37.2014671Z adding: test/test-reports/nn.test_init_1.1_83fc7b147f0c612f_.log (deflated 78%) 2025-12-04T14:40:37.2015670Z adding: test/test-reports/torch_np.test_scalars_0D_arrays_1.1_ed0b8e715c6c1568_.log (deflated 85%) 2025-12-04T14:40:37.2018005Z adding: test/test-reports/torch_np.numpy_tests.core.test_scalar_methods_1.1_ab75aa6938d6d2cf_.log (deflated 92%) 2025-12-04T14:40:37.2018618Z adding: test/test-reports/torch_np.numpy_tests.fft.test_helper_1.1_692efc716c4081b6_.log (deflated 69%) 2025-12-04T14:40:37.2019153Z adding: test/test-reports/test_mobile_optimizer_1.1_9983f8f253c5b059_.log (deflated 67%) 2025-12-04T14:40:37.2041127Z adding: test/test-reports/test_overrides_1.1_89ffca0310ffa68a_.log (deflated 93%) 2025-12-04T14:40:37.2041734Z adding: test/test-reports/torch_np.test_function_base_1.1_500584d725d1f2e7_.log (deflated 55%) 2025-12-04T14:40:37.2050876Z adding: test/test-reports/test_type_promotion_1.1_6db001c785be877d_.log (deflated 94%) 2025-12-04T14:40:37.2074690Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-12-04T14:40:37.2075063Z # Remove any previous debugging artifacts if they exist 2025-12-04T14:40:37.2075365Z rm -f debug-*.zip 2025-12-04T14:40:37.2075581Z if [ -d 'test/debug' ]; then 2025-12-04T14:40:37.2075837Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-12-04T14:40:37.2076091Z fi 2025-12-04T14:40:37.2083033Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:37.2083315Z env: 2025-12-04T14:40:37.2083474Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:37.2083672Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:37.2083899Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:37.2084295Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:37.2084792Z FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862 2025-12-04T14:40:37.2085160Z ##[endgroup] 2025-12-04T14:40:37.2161487Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T14:40:37.2161737Z with: 2025-12-04T14:40:37.2161922Z s3-bucket: gha-artifacts 2025-12-04T14:40:37.2162164Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T14:40:37.2162421Z retention-days: 14 2025-12-04T14:40:37.2162607Z if-no-files-found: warn 2025-12-04T14:40:37.2162808Z path: test-jsons-*.zip 2025-12-04T14:40:37.2162996Z name: artifact 2025-12-04T14:40:37.2163169Z region: us-east-1 2025-12-04T14:40:37.2163337Z env: 2025-12-04T14:40:37.2163494Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:37.2163694Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:37.2163924Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:37.2164321Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:37.2164735Z ##[endgroup] 2025-12-04T14:40:37.5394632Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T14:40:37.5395106Z With the provided path, there will be 1 file uploaded 2025-12-04T14:40:37.5395527Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T14:40:37.5450462Z Starting upload of test-jsons-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip 2025-12-04T14:40:37.7549405Z Finished upload of test-jsons-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip 2025-12-04T14:40:37.7779869Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T14:40:37.7780122Z with: 2025-12-04T14:40:37.7780299Z s3-bucket: gha-artifacts 2025-12-04T14:40:37.7780550Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T14:40:37.7781026Z retention-days: 14 2025-12-04T14:40:37.7781217Z if-no-files-found: error 2025-12-04T14:40:37.7781424Z path: test-reports-*.zip 2025-12-04T14:40:37.7781617Z name: artifact 2025-12-04T14:40:37.7781787Z region: us-east-1 2025-12-04T14:40:37.7781956Z env: 2025-12-04T14:40:37.7782125Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:37.7782324Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:37.7782561Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:37.7782972Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:37.7783339Z ##[endgroup] 2025-12-04T14:40:38.0706484Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T14:40:38.0706929Z With the provided path, there will be 1 file uploaded 2025-12-04T14:40:38.0707333Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T14:40:38.0760928Z Starting upload of test-reports-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip 2025-12-04T14:40:38.2669276Z Finished upload of test-reports-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip 2025-12-04T14:40:38.2901084Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T14:40:38.2901345Z with: 2025-12-04T14:40:38.2901523Z s3-bucket: gha-artifacts 2025-12-04T14:40:38.2901773Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T14:40:38.2902034Z retention-days: 14 2025-12-04T14:40:38.2902228Z if-no-files-found: ignore 2025-12-04T14:40:38.2902438Z path: logs-*.zip 2025-12-04T14:40:38.2902617Z name: artifact 2025-12-04T14:40:38.2902789Z region: us-east-1 2025-12-04T14:40:38.2902956Z env: 2025-12-04T14:40:38.2903118Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:38.2903314Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:38.2903561Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:38.2903958Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:38.2904341Z ##[endgroup] 2025-12-04T14:40:38.5845428Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T14:40:38.5845876Z With the provided path, there will be 1 file uploaded 2025-12-04T14:40:38.5846278Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T14:40:38.5899281Z Starting upload of logs-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip 2025-12-04T14:40:38.8064063Z Finished upload of logs-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip 2025-12-04T14:40:38.8297687Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T14:40:38.8297940Z with: 2025-12-04T14:40:38.8298117Z s3-bucket: gha-artifacts 2025-12-04T14:40:38.8298364Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T14:40:38.8298628Z retention-days: 14 2025-12-04T14:40:38.8298833Z if-no-files-found: ignore 2025-12-04T14:40:38.8299039Z path: debug-*.zip 2025-12-04T14:40:38.8299210Z name: artifact 2025-12-04T14:40:38.8299402Z region: us-east-1 2025-12-04T14:40:38.8299571Z env: 2025-12-04T14:40:38.8299734Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:38.8299929Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:38.8300166Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:38.8300569Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:38.8300927Z ##[endgroup] 2025-12-04T14:40:39.1187023Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-12-04T14:40:39.1426290Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T14:40:39.1426603Z # shellcheck disable=SC2156 2025-12-04T14:40:39.1427035Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T14:40:39.1434957Z shell: /usr/bin/bash -e {0} 2025-12-04T14:40:39.1435164Z env: 2025-12-04T14:40:39.1435320Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:39.1435519Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:39.1435928Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:39.1436325Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:39.1436678Z ##[endgroup] 2025-12-04T14:40:39.4888803Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a 2025-12-04T14:40:39.4889192Z with: 2025-12-04T14:40:39.4889481Z name: coredumps-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu 2025-12-04T14:40:39.4889816Z retention-days: 14 2025-12-04T14:40:39.4890002Z if-no-files-found: ignore 2025-12-04T14:40:39.4890212Z path: ./**/core.[1-9]* 2025-12-04T14:40:39.4890402Z s3-bucket: gha-artifacts 2025-12-04T14:40:39.4890592Z region: us-east-1 2025-12-04T14:40:39.4890751Z env: 2025-12-04T14:40:39.4890893Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:39.4891074Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:39.4891298Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:39.4891707Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:39.4892073Z ##[endgroup] 2025-12-04T14:40:49.8824696Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T14:40:49.9175357Z Prepare all required actions 2025-12-04T14:40:49.9175704Z Getting action download info 2025-12-04T14:40:50.0938430Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548) 2025-12-04T14:40:50.5079722Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-12-04T14:40:50.5080106Z with: 2025-12-04T14:40:50.5080283Z job_id: 57116084862 2025-12-04T14:40:50.5080714Z job_name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check) 2025-12-04T14:40:50.5081175Z workflow_name: trunk 2025-12-04T14:40:50.5081366Z workflow_run_id: 19922768520 2025-12-04T14:40:50.5081586Z workflow_attempt: 1 2025-12-04T14:40:50.5081791Z env: 2025-12-04T14:40:50.5081955Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:50.5082161Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:50.5082413Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:50.5082851Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:50.5083212Z ##[endgroup] 2025-12-04T14:40:50.5115575Z ##[group]Run actions/setup-python@v6 2025-12-04T14:40:50.5115813Z with: 2025-12-04T14:40:50.5115984Z python-version: 3.10 2025-12-04T14:40:50.5116181Z check-latest: false 2025-12-04T14:40:50.5116465Z token: *** 2025-12-04T14:40:50.5116647Z update-environment: true 2025-12-04T14:40:50.5116858Z allow-prereleases: false 2025-12-04T14:40:50.5117359Z freethreaded: false 2025-12-04T14:40:50.5117561Z env: 2025-12-04T14:40:50.5117724Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:50.5117927Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:50.5118172Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:50.5118594Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:50.5118959Z ##[endgroup] 2025-12-04T14:40:50.9027895Z ##[group]Installed versions 2025-12-04T14:40:50.9036308Z Version 3.10 was not found in the local cache 2025-12-04T14:40:50.9192358Z (node:305767) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2025-12-04T14:40:50.9193085Z (Use `node --trace-deprecation ...` to show where the warning was created) 2025-12-04T14:40:51.2552450Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system. The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json 2025-12-04T14:40:51.2747469Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-12-04T14:40:51.2747831Z with: 2025-12-04T14:40:51.2747976Z env: 2025-12-04T14:40:51.2748136Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:51.2748551Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:51.2748793Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:51.2749180Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:51.2749532Z ##[endgroup] 2025-12-04T14:40:51.2763212Z ##[group]Run set -eou pipefail 2025-12-04T14:40:51.2763447Z set -eou pipefail 2025-12-04T14:40:51.2763633Z  2025-12-04T14:40:51.2763909Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-12-04T14:40:51.2764337Z for _ in $(seq 1440); do 2025-12-04T14:40:51.2764666Z  # Break if no ssh session exists anymore 2025-12-04T14:40:51.2764909Z  if [ "$(who)" = "" ]; then 2025-12-04T14:40:51.2765155Z  break 2025-12-04T14:40:51.2765314Z  fi 2025-12-04T14:40:51.2765474Z  echo "." 2025-12-04T14:40:51.2765651Z  sleep 5 2025-12-04T14:40:51.2765805Z done 2025-12-04T14:40:51.2773350Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:51.2773631Z env: 2025-12-04T14:40:51.2773795Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:51.2773983Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:51.2774211Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:51.2774601Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:51.2774948Z ##[endgroup] 2025-12-04T14:40:51.2802118Z Holding runner for 2 hours until all ssh sessions have logged out 2025-12-04T14:40:51.2894626Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T14:40:51.2895027Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T14:40:51.2895338Z # shellcheck disable=SC2046 2025-12-04T14:40:51.2895590Z docker stop $(docker ps -q) || true 2025-12-04T14:40:51.2895844Z # Prune all of the docker images 2025-12-04T14:40:51.2896085Z docker system prune -af 2025-12-04T14:40:51.2903189Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:40:51.2903471Z env: 2025-12-04T14:40:51.2903632Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:40:51.2903820Z HAS_NVIDIA_GPU: true 2025-12-04T14:40:51.2904050Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:40:51.2904443Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:40:51.2904798Z ##[endgroup] 2025-12-04T14:41:02.4769499Z e29498c26bf7 2025-12-04T14:41:09.0091776Z Deleted Containers: 2025-12-04T14:41:09.0092203Z e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:41:09.0092518Z 2025-12-04T14:41:20.3041143Z Deleted Images: 2025-12-04T14:41:20.3041595Z untagged: public.ecr.aws/docker/library/python:3.13 2025-12-04T14:41:20.3042275Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T14:41:20.3043030Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a 2025-12-04T14:41:20.3043833Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd 2025-12-04T14:41:20.3044404Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67 2025-12-04T14:41:20.3044976Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c 2025-12-04T14:41:20.3045559Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212 2025-12-04T14:41:20.3046054Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318 2025-12-04T14:41:20.3046477Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560 2025-12-04T14:41:20.3047206Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee 2025-12-04T14:41:20.3048066Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T14:41:20.3049158Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T14:41:20.3049780Z deleted: sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301 2025-12-04T14:41:20.3050218Z deleted: sha256:85a76b7bf29ad34eb76cce6f46af5d49a58b6272f80f983d5c769e82c7749301 2025-12-04T14:41:20.3050660Z deleted: sha256:0882f3ce59ff5ae30195ee4b059fc713e13eda107a3a7814a4616ac9058a30a4 2025-12-04T14:41:20.3051095Z deleted: sha256:64ba5b9344c11a3e4729136076830b90ac4cf1554046edb1bd4f0784b66ebd9b 2025-12-04T14:41:20.3051522Z deleted: sha256:88213c59cf461a65ab9b6cb07b4195dc9d41b5241c152daa002c7b3112e09124 2025-12-04T14:41:20.3051959Z deleted: sha256:4c0f83afa802ffbc05ebaf1aa50e48a2447c7c295549a6dded80ac63437906ca 2025-12-04T14:41:20.3052398Z deleted: sha256:6f7ec74460e8fb070c8209949095ea3be5f4e2fd69c9f750cd39ac4093f5e64b 2025-12-04T14:41:20.3052831Z deleted: sha256:d6928b0d1021b31942fdcb64e5eb4a34682de66e959dd424ed6ed02c29cd706d 2025-12-04T14:41:20.3053265Z deleted: sha256:4e9fbcb1705a6351bb34dd320558752614308636b94fd9ae6f26063e3deadc0a 2025-12-04T14:41:20.3053688Z deleted: sha256:43aabd0201f48712f21758071352dea029b4de37be08b2e2197706856a9ecbf2 2025-12-04T14:41:20.3054114Z deleted: sha256:940a98dec78303f0548beb1033242a45e9097607ef3e55c8b949b69b73d1b95e 2025-12-04T14:41:20.3054536Z deleted: sha256:d2849fa0e0411cf66e4408831d70e38838afb55b11a80c1c4d8aa0ae7dc9ca40 2025-12-04T14:41:20.3055014Z deleted: sha256:14f40d23c20c7e562623f89deb376520296758bc39dd3c77284049b84ebd8a31 2025-12-04T14:41:20.3055536Z deleted: sha256:a8ccba61f90ca097cb391d0f4fbed0d9f821d06b00e28f7332e9e2dcfcbac4ca 2025-12-04T14:41:20.3056138Z deleted: sha256:91b2060d290547d3b517d4a11d994bbe23f4560b5546cb91918ca1828dde6be1 2025-12-04T14:41:20.3056564Z deleted: sha256:b42a184755715dcfead7fad655a127433541d316d9628f5f730ff17ad5f8071c 2025-12-04T14:41:20.3056992Z deleted: sha256:aa5b4f3c9169061dc3c6da0e677e8a86f11ecb0a3f9fb4861ab3d8c04379775c 2025-12-04T14:41:20.3057436Z deleted: sha256:b4dcf450081a48d77fea0a21b8d810a69c03608a595e754fe7d365058d0579b7 2025-12-04T14:41:20.3057876Z deleted: sha256:4f7fe12d3d4f5bf890c7ada4ce16f17a105472aa6509a778f917dcce2f28174b 2025-12-04T14:41:20.3058310Z deleted: sha256:2d1d5a74182594f9a8553df00fdcfc809dba407bcd6700d667f862cbe9d555ce 2025-12-04T14:41:20.3058750Z deleted: sha256:d901e2f5d449aeed16b727bdcc11fc0e0f6c30c8fc5c39ac7eeac8a74d9d176c 2025-12-04T14:41:20.3059220Z deleted: sha256:a04df2603bd12372c6632469a9a81ebc4a8d677452c250672b9692884fa6a452 2025-12-04T14:41:20.3059644Z deleted: sha256:f438a6b52273a552dc3820a55c74c53a62a0eae9f2a7d21b37125add7d71639f 2025-12-04T14:41:20.3060073Z deleted: sha256:d4b09517e9518d709ac98b0ae6f8446ec9ac51688253607b1fca67aa2c87b3f4 2025-12-04T14:41:20.3060501Z deleted: sha256:c1fa38335237f5e7263e39d3d3de98215bcfbbb12b826955c02e149bf68efd13 2025-12-04T14:41:20.3060932Z deleted: sha256:c898d20a30de901fca74d7611663b17ab48e1726a11e031e40548ed16ee81877 2025-12-04T14:41:20.3061373Z deleted: sha256:3baceec7096518fcc10696feba551639d698b3145c2fc09cac927bb60c0fd751 2025-12-04T14:41:20.3061805Z deleted: sha256:5245aaaa3d5c3a19f76b9a6c920bd82d1a0ff5289f87c8c109652089709d9b3b 2025-12-04T14:41:20.3062231Z deleted: sha256:f05cc789b95246938c377f474c41187965b89ceac0250e7d5124bec32153f447 2025-12-04T14:41:20.3062668Z deleted: sha256:07ec4fc008de4e7a2c794ec7094cc72e0d287c04c8b2156163aee0bae147fe2d 2025-12-04T14:41:20.3063103Z deleted: sha256:c6302601ad5fde573c1f8c900250478fca7fdc6907d8fd4fae651b94b4d9264d 2025-12-04T14:41:20.3063534Z deleted: sha256:cc5e955ee1dc54931f02606c5ea87aae14f03b5d764492be611480ab041f2882 2025-12-04T14:41:20.3063965Z deleted: sha256:f21c03518996d98452338f4e80bcfd9b139a1dab155f4830be0d3f623035269f 2025-12-04T14:41:20.3064523Z deleted: sha256:519ca6f1279f7886f25f0005527cfa627deebbc5b7d7cdbfa7ef962bcfc4c26d 2025-12-04T14:41:20.3064960Z deleted: sha256:0ef990495216807d0175b192045be3f617e72331bc373b3434807f41bf69168d 2025-12-04T14:41:20.3065381Z deleted: sha256:7093edf7319e1f0e01654c3224e32c8dede5b948d106e0b9b03cbf0bb1091e33 2025-12-04T14:41:20.3065890Z deleted: sha256:c478161e058e2f4041555c3e880b95ee1ee047938dc58549a3a88135740996ae 2025-12-04T14:41:20.3066311Z deleted: sha256:9bb853b0d938cd7c36a80ce8ee40653f2c0ff92719209b11beb03acc8855ce3e 2025-12-04T14:41:20.3066738Z deleted: sha256:fdf2ace71a78ce6910ef9c4b073c195531da47022443b606bb92dcd6499b6afc 2025-12-04T14:41:20.3067193Z deleted: sha256:576c2b3770d871937d3cfb7014328bcb4bd1aed0c28bc438764b3bfdac4c1ac2 2025-12-04T14:41:20.3067624Z deleted: sha256:878e92b9cb82de09ac14a9d5f3f7bc2411a799b6f54d0d64b78c2bb4d1fdc0fc 2025-12-04T14:41:20.3068053Z deleted: sha256:85c8c3b98b65a6695f988a10cc66c981d73a3ef03eda15b8e14d227b50b56300 2025-12-04T14:41:20.3068488Z deleted: sha256:ce2ab3ba07794f9ee95d6ea7de6dcd3d2aed96561f9a79192dd56ca5bf29313a 2025-12-04T14:41:20.3068922Z deleted: sha256:37a6e12976ca957286977e696e63012ab9821214b0483fe1a48d29dcb280508a 2025-12-04T14:41:20.3069352Z deleted: sha256:cd1d5d3dd7038144ca6fe961c0d4c8e705625ae0c36190ba8b3e9602abedad19 2025-12-04T14:41:20.3069782Z deleted: sha256:0e707276e0be2e0008b86d594fadc0d16444d66c4fb7227c56f144cbb3c2affd 2025-12-04T14:41:20.3070214Z deleted: sha256:22d4aad6a2ada91b341c1225a0f314042b8aeabef7568c5c019709b058bf070b 2025-12-04T14:41:20.3070655Z deleted: sha256:ee4adacf4e0933131d0275eddad406b3c8147e6cf07a292b99f1aff4b5355f33 2025-12-04T14:41:20.3071088Z deleted: sha256:43da0b9e7c0e18403dcb834e53628dc7c970ccb2dbd091878c0d7c0170dbc97f 2025-12-04T14:41:20.3071515Z deleted: sha256:00571684bdcd75beda15eb7d4e79b5458bc914350f9bb4d87fcdc97ad15e0da1 2025-12-04T14:41:20.3071957Z deleted: sha256:41615f09950259f1d75e82ef35b6fc53b18fe71ebff143744cfd51009d04349e 2025-12-04T14:41:20.3072403Z deleted: sha256:75ab34d2eed3c7915467a506ab6dab2711918fbabe94add2fb5c62780221ab0c 2025-12-04T14:41:20.3072853Z deleted: sha256:0a39ef2bebf44c1c3893d1e5fb42dad48b8fac7ca673141267ee967f85455e89 2025-12-04T14:41:20.3073287Z deleted: sha256:9b7d024e48ba1f9824a54597621b1b062cbc4aa41a77d81ca538d6b5c24a612c 2025-12-04T14:41:20.3073732Z deleted: sha256:392257172de6434c271bd93394218a91e9aa86d7c18abc2f2759317b9d5fb6de 2025-12-04T14:41:20.3074155Z deleted: sha256:6c3232860b930866a463a356124fc392c7e5f04895695229257e8c3e8a02711d 2025-12-04T14:41:20.3074572Z deleted: sha256:63dd55b807215e2fa6c715419ac0c5072d02dddc848dbf74bb7e77b906b5eaed 2025-12-04T14:41:20.3075014Z deleted: sha256:07a8738c1b4584db72ed9aa60f5274321eb0ba16263450da3a75df8326ebc25f 2025-12-04T14:41:20.3075439Z deleted: sha256:053fe2965b01281d12040ec1893e0d1aa77362a49ea9a1067402272c69dad9f5 2025-12-04T14:41:20.3075869Z deleted: sha256:7857fb5eb181c4e80262ecab60bdd3c266cf3d1409ceb76c05882609b416a8d3 2025-12-04T14:41:20.3076297Z deleted: sha256:752528477fc99089de3bd2c6da7b30cf34f2e901fe06d8fcfe685b411461e883 2025-12-04T14:41:20.3076734Z deleted: sha256:cce0210e2f4b042601813df03aa294a86b0c668fcfc75f4c63f6fa12b2952e15 2025-12-04T14:41:20.3077166Z deleted: sha256:f2bb405a26705ecd12d21380d26d9355d01db3a2175080fbdb468f2b5a25a76c 2025-12-04T14:41:20.3077603Z deleted: sha256:ad430120d4ffbaf97cd8d6de6ea8eefa4a8f80ec45f0b176c6b26bff0970fd33 2025-12-04T14:41:20.3078054Z deleted: sha256:225a4910baea7cc540ed43eeac75046293800ab0b8e0192b51e991c8cb50bcf3 2025-12-04T14:41:20.3078490Z deleted: sha256:a259945b0c3507f049fbac10fb3d3ffe43d45e83c91b80ae8cd1dafb855ad83c 2025-12-04T14:41:20.3078922Z deleted: sha256:862a98881b1d5adad5c21d01602773b894794097de80964ef8f47bcaadb43255 2025-12-04T14:41:20.3079343Z deleted: sha256:1cf6d3c8b6c2694b79a2d08719594903811c330a36a4c7a8a7153a350b53d292 2025-12-04T14:41:20.3079773Z deleted: sha256:232a1ae8b0fee817ff7838bb5986a2f38377d3b1dbbf5217b576af0f953b0844 2025-12-04T14:41:20.3080333Z deleted: sha256:c72c5705dabd6314423dd7d4fb260a20d5d9886b2ebce60d19e9d78c4a2335c2 2025-12-04T14:41:20.3080857Z deleted: sha256:296734cf81fd92c913884d058908598424ffe072676e38de289bbab83768c7bd 2025-12-04T14:41:20.3081274Z deleted: sha256:7c76040481b889847a1804021aeff07547eaa4ee706d6137db218d497a8fd9c1 2025-12-04T14:41:20.3081707Z deleted: sha256:d5e293f5b354e8cbcc6de893ea72cc632b02d8fdfbb08ec3127c4e9662f3ebff 2025-12-04T14:41:20.3082222Z deleted: sha256:f35a64e429c88e249645090f21fbe7dae108d98e0ab4ea13184f24b3fd66c315 2025-12-04T14:41:20.3082815Z deleted: sha256:ce6ae8d595c8e69115c51b1ce4f9a9158795d7b863b1cb53f21c39a87974d41b 2025-12-04T14:41:20.3083257Z deleted: sha256:8941abaee59400fb9b3a60765fea4a1fc2a6a447467a6d983e84c7f72494a450 2025-12-04T14:41:20.3083696Z deleted: sha256:ef53c29a9a2c2bc80ffdb9bfaf92842436b5755ec1ce828b9d11e5e27d656ea1 2025-12-04T14:41:20.3084134Z deleted: sha256:7a347fb0acb43f1c814f8c8ff21185e8b5cf64d7bc5988cea060f77d906e08b5 2025-12-04T14:41:20.3084562Z deleted: sha256:cc855dc9be79496e15175569dced2d13477e50b077a5fd3945f9bf50018880c1 2025-12-04T14:41:20.3084996Z deleted: sha256:f7a9946ada3d4786658bc0b643808bb32a9a45e4e90e30dc43ee19e2dbe24024 2025-12-04T14:41:20.3085435Z deleted: sha256:c22a9215f62812c1d2e32827f5221ff556c5b6702aadbdab6b87b8293f19635e 2025-12-04T14:41:20.3085853Z deleted: sha256:959a56746620012e37c1def1a83c5afb1e7c0adc59b021a28beb53c24df98032 2025-12-04T14:41:20.3086286Z deleted: sha256:31a0fff0695bf6100c17954be72eab2095b466d559c75c3faf2a17d8c41e6ebe 2025-12-04T14:41:20.3086722Z deleted: sha256:c15e2b5241b9e55af1b2593e544391b4b44d0505e6528e8f12425136e93b424c 2025-12-04T14:41:20.3087150Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b 2025-12-04T14:41:20.3087402Z 2025-12-04T14:41:20.3087489Z Total reclaimed space: 40.14GB 2025-12-04T14:41:20.3135778Z ##[group]Run set +e 2025-12-04T14:41:20.3136045Z set +e 2025-12-04T14:41:20.3136233Z set -x 2025-12-04T14:41:20.3136395Z  2025-12-04T14:41:20.3136549Z nvidia-smi 2025-12-04T14:41:20.3136893Z # NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in 2025-12-04T14:41:20.3137381Z # the case where the driver has already crashed as it still can get the driver version 2025-12-04T14:41:20.3137840Z # and some basic information like the bus ID. However, the rest of the information 2025-12-04T14:41:20.3138210Z # would be missing (ERR!), for example: 2025-12-04T14:41:20.3138444Z # 2025-12-04T14:41:20.3138666Z # +-----------------------------------------------------------------------------+ 2025-12-04T14:41:20.3139042Z # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | 2025-12-04T14:41:20.3139441Z # |-------------------------------+----------------------+----------------------+ 2025-12-04T14:41:20.3139820Z # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T14:41:20.3140220Z # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T14:41:20.3140561Z # | | | MIG M. | 2025-12-04T14:41:20.3140821Z # |===============================+======================+======================| 2025-12-04T14:41:20.3141107Z # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | 2025-12-04T14:41:20.3141450Z # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | 2025-12-04T14:41:20.3141767Z # | | | ERR! | 2025-12-04T14:41:20.3142067Z # +-------------------------------+----------------------+----------------------+ 2025-12-04T14:41:20.3142323Z # 2025-12-04T14:41:20.3142534Z # +-----------------------------------------------------------------------------+ 2025-12-04T14:41:20.3142848Z # | Processes: | 2025-12-04T14:41:20.3143185Z # | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T14:41:20.3143491Z # | ID ID Usage | 2025-12-04T14:41:20.3143748Z # |=============================================================================| 2025-12-04T14:41:20.3144206Z # +-----------------------------------------------------------------------------+ 2025-12-04T14:41:20.3144463Z # 2025-12-04T14:41:20.3144725Z # This should be reported as a failure instead as it will guarantee to fail when 2025-12-04T14:41:20.3145096Z # Docker tries to run with --gpus all 2025-12-04T14:41:20.3145324Z # 2025-12-04T14:41:20.3145576Z # So, the correct check here is to query one of the missing piece of info like 2025-12-04T14:41:20.3145932Z # GPU name, so that the command can fail accordingly 2025-12-04T14:41:20.3146286Z nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T14:41:20.3146596Z NVIDIA_SMI_STATUS=$? 2025-12-04T14:41:20.3146775Z  2025-12-04T14:41:20.3147076Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T14:41:20.3147532Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T14:41:20.3147952Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T14:41:20.3148298Z  .github/scripts/stop_runner_service.sh 2025-12-04T14:41:20.3148525Z fi 2025-12-04T14:41:20.3148673Z  2025-12-04T14:41:20.3149057Z # For runner with multiple GPUs, we also want to confirm that the number of GPUs are the 2025-12-04T14:41:20.3149498Z # power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails 2025-12-04T14:41:20.3149876Z # https://github.com/pytorch/test-infra/issues/4000 2025-12-04T14:41:20.3150178Z GPU_COUNT=$(nvidia-smi --list-gpus | wc -l) 2025-12-04T14:41:20.3150429Z NVIDIA_SMI_STATUS=$? 2025-12-04T14:41:20.3150616Z  2025-12-04T14:41:20.3150917Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T14:41:20.3151370Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T14:41:20.3151765Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T14:41:20.3152110Z  .github/scripts/stop_runner_service.sh 2025-12-04T14:41:20.3152339Z fi 2025-12-04T14:41:20.3152491Z  2025-12-04T14:41:20.3152667Z # Check the GPU count to be a power of 2 2025-12-04T14:41:20.3153054Z if [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then 2025-12-04T14:41:20.3153575Z  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..." 2025-12-04T14:41:20.3153970Z  .github/scripts/stop_runner_service.sh 2025-12-04T14:41:20.3154198Z fi 2025-12-04T14:41:20.3164453Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:41:20.3164739Z env: 2025-12-04T14:41:20.3164903Z GIT_DEFAULT_BRANCH: main 2025-12-04T14:41:20.3165109Z HAS_NVIDIA_GPU: true 2025-12-04T14:41:20.3165339Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T14:41:20.3165729Z DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 2025-12-04T14:41:20.3166081Z ##[endgroup] 2025-12-04T14:41:20.3197184Z + nvidia-smi 2025-12-04T14:41:20.3408628Z Thu Dec 4 14:41:20 2025 2025-12-04T14:41:20.3409000Z +-----------------------------------------------------------------------------------------+ 2025-12-04T14:41:20.3409495Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T14:41:20.3409944Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T14:41:20.3410402Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T14:41:20.3410889Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T14:41:20.3411612Z | | | MIG M. | 2025-12-04T14:41:20.3412123Z |=========================================+========================+======================| 2025-12-04T14:41:20.3578849Z | 0 NVIDIA L4 On | 00000000:35:00.0 Off | 0 | 2025-12-04T14:41:20.3579296Z | N/A 31C P8 12W / 72W | 0MiB / 23034MiB | 0% Default | 2025-12-04T14:41:20.3579654Z | | | N/A | 2025-12-04T14:41:20.3580023Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T14:41:20.3582378Z 2025-12-04T14:41:20.3582562Z +-----------------------------------------------------------------------------------------+ 2025-12-04T14:41:20.3582969Z | Processes: | 2025-12-04T14:41:20.3583419Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T14:41:20.3583803Z | ID ID Usage | 2025-12-04T14:41:20.3584357Z |=========================================================================================| 2025-12-04T14:41:20.3587441Z | No running processes found | 2025-12-04T14:41:20.3587792Z +-----------------------------------------------------------------------------------------+ 2025-12-04T14:41:20.5903758Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T14:41:20.6090073Z NVIDIA L4 2025-12-04T14:41:20.6145367Z + NVIDIA_SMI_STATUS=0 2025-12-04T14:41:20.6145620Z + '[' 0 -ne 0 ']' 2025-12-04T14:41:20.6151618Z ++ nvidia-smi --list-gpus 2025-12-04T14:41:20.6152806Z ++ wc -l 2025-12-04T14:41:20.6394553Z + GPU_COUNT=1 2025-12-04T14:41:20.6394800Z + NVIDIA_SMI_STATUS=0 2025-12-04T14:41:20.6395023Z + '[' 0 -ne 0 ']' 2025-12-04T14:41:20.6395364Z + '[' 1 -le 8 ']' 2025-12-04T14:41:20.6395550Z + '[' 1 -ne 1 ']' 2025-12-04T14:41:20.6461036Z Post job cleanup. 2025-12-04T14:41:20.6520001Z Post job cleanup. 2025-12-04T14:41:20.6556067Z Post job cleanup. 2025-12-04T14:41:20.7506688Z [command]/usr/bin/git version 2025-12-04T14:41:20.7546542Z git version 2.50.1 2025-12-04T14:41:20.7579971Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/590277cf-288d-4b27-bbc2-e34b5fb61d2c/.gitconfig' 2025-12-04T14:41:20.7589622Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/590277cf-288d-4b27-bbc2-e34b5fb61d2c' before making global git config changes 2025-12-04T14:41:20.7590564Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T14:41:20.7594886Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T14:41:20.7639792Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T14:41:20.7677908Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T14:41:20.8017741Z Entering 'android/libs/fbjni' 2025-12-04T14:41:20.8083346Z Entering 'third_party/FP16' 2025-12-04T14:41:20.8148324Z Entering 'third_party/FXdiv' 2025-12-04T14:41:20.8213231Z Entering 'third_party/NNPACK' 2025-12-04T14:41:20.8276165Z Entering 'third_party/NVTX' 2025-12-04T14:41:20.8343240Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T14:41:20.8408015Z Entering 'third_party/XNNPACK' 2025-12-04T14:41:20.8487661Z Entering 'third_party/aiter' 2025-12-04T14:41:20.8553630Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T14:41:20.8627911Z Entering 'third_party/benchmark' 2025-12-04T14:41:20.8693598Z Entering 'third_party/composable_kernel' 2025-12-04T14:41:20.8766654Z Entering 'third_party/cpp-httplib' 2025-12-04T14:41:20.8832748Z Entering 'third_party/cpuinfo' 2025-12-04T14:41:20.8897654Z Entering 'third_party/cudnn_frontend' 2025-12-04T14:41:20.8963973Z Entering 'third_party/cutlass' 2025-12-04T14:41:20.9040736Z Entering 'third_party/fbgemm' 2025-12-04T14:41:20.9105607Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T14:41:20.9171517Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T14:41:20.9242133Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T14:41:20.9306394Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T14:41:20.9379393Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T14:41:20.9443691Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T14:41:20.9506134Z Entering 'third_party/fbgemm/external/json' 2025-12-04T14:41:20.9571742Z Entering 'third_party/flash-attention' 2025-12-04T14:41:20.9634777Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T14:41:20.9705170Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T14:41:20.9781584Z Entering 'third_party/flatbuffers' 2025-12-04T14:41:20.9850178Z Entering 'third_party/fmt' 2025-12-04T14:41:20.9914462Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T14:41:20.9979463Z Entering 'third_party/gloo' 2025-12-04T14:41:21.0044600Z Entering 'third_party/googletest' 2025-12-04T14:41:21.0111043Z Entering 'third_party/ideep' 2025-12-04T14:41:21.0179645Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T14:41:21.0251844Z Entering 'third_party/ittapi' 2025-12-04T14:41:21.0319753Z Entering 'third_party/kineto' 2025-12-04T14:41:21.0383114Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T14:41:21.0445108Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T14:41:21.0508854Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T14:41:21.0573887Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T14:41:21.0637683Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T14:41:21.0701393Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T14:41:21.0774712Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T14:41:21.0838163Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T14:41:21.0903716Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T14:41:21.0968273Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T14:41:21.1033230Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T14:41:21.1095698Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T14:41:21.1160822Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T14:41:21.1230770Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T14:41:21.1293778Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T14:41:21.1358298Z Entering 'third_party/kleidiai' 2025-12-04T14:41:21.1424205Z Entering 'third_party/mimalloc' 2025-12-04T14:41:21.1496535Z Entering 'third_party/nlohmann' 2025-12-04T14:41:21.1568491Z Entering 'third_party/onnx' 2025-12-04T14:41:21.1647874Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T14:41:21.1717147Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T14:41:21.1783203Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T14:41:21.1847201Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T14:41:21.1915337Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T14:41:21.1984399Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T14:41:21.2057914Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T14:41:21.2127103Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T14:41:21.2190739Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T14:41:21.2251915Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T14:41:21.2321020Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T14:41:21.2385174Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T14:41:21.2476727Z Entering 'third_party/pocketfft' 2025-12-04T14:41:21.2542458Z Entering 'third_party/protobuf' 2025-12-04T14:41:21.2608960Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T14:41:21.2673495Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T14:41:21.2740336Z Entering 'third_party/psimd' 2025-12-04T14:41:21.2811738Z Entering 'third_party/pthreadpool' 2025-12-04T14:41:21.2876834Z Entering 'third_party/pybind11' 2025-12-04T14:41:21.2943660Z Entering 'third_party/python-peachpy' 2025-12-04T14:41:21.3006939Z Entering 'third_party/sleef' 2025-12-04T14:41:21.3080292Z Entering 'third_party/tensorpipe' 2025-12-04T14:41:21.3142773Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T14:41:21.3222205Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T14:41:21.3285947Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T14:41:21.3357908Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T14:41:21.3427982Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T14:41:21.3523259Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T14:41:21.3545617Z http.https://github.com/.extraheader 2025-12-04T14:41:21.3554982Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T14:41:21.3587824Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T14:41:21.3919157Z Entering 'android/libs/fbjni' 2025-12-04T14:41:21.3962922Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4004272Z Entering 'third_party/FP16' 2025-12-04T14:41:21.4045104Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4085874Z Entering 'third_party/FXdiv' 2025-12-04T14:41:21.4130154Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4172291Z Entering 'third_party/NNPACK' 2025-12-04T14:41:21.4213595Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4259391Z Entering 'third_party/NVTX' 2025-12-04T14:41:21.4301535Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4355274Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T14:41:21.4402665Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4444199Z Entering 'third_party/XNNPACK' 2025-12-04T14:41:21.4491712Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4552047Z Entering 'third_party/aiter' 2025-12-04T14:41:21.4594779Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4635580Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T14:41:21.4681879Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4738680Z Entering 'third_party/benchmark' 2025-12-04T14:41:21.4783409Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4824320Z Entering 'third_party/composable_kernel' 2025-12-04T14:41:21.4868144Z http.https://github.com/.extraheader 2025-12-04T14:41:21.4914936Z Entering 'third_party/cpp-httplib' 2025-12-04T14:41:21.4965732Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5006493Z Entering 'third_party/cpuinfo' 2025-12-04T14:41:21.5047037Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5087195Z Entering 'third_party/cudnn_frontend' 2025-12-04T14:41:21.5130756Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5172772Z Entering 'third_party/cutlass' 2025-12-04T14:41:21.5215701Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5265807Z Entering 'third_party/fbgemm' 2025-12-04T14:41:21.5309529Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5357934Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T14:41:21.5399760Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5439322Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T14:41:21.5480769Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5532458Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T14:41:21.5575831Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5616609Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T14:41:21.5655676Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5704156Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T14:41:21.5753021Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5793733Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T14:41:21.5835546Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5874910Z Entering 'third_party/fbgemm/external/json' 2025-12-04T14:41:21.5922529Z http.https://github.com/.extraheader 2025-12-04T14:41:21.5967157Z Entering 'third_party/flash-attention' 2025-12-04T14:41:21.6010664Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6051161Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T14:41:21.6092296Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6139678Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T14:41:21.6180991Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6233337Z Entering 'third_party/flatbuffers' 2025-12-04T14:41:21.6275363Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6318963Z Entering 'third_party/fmt' 2025-12-04T14:41:21.6361157Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6403699Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T14:41:21.6447772Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6487831Z Entering 'third_party/gloo' 2025-12-04T14:41:21.6531256Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6580700Z Entering 'third_party/googletest' 2025-12-04T14:41:21.6623305Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6664629Z Entering 'third_party/ideep' 2025-12-04T14:41:21.6710664Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6755590Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T14:41:21.6798299Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6846069Z Entering 'third_party/ittapi' 2025-12-04T14:41:21.6889304Z http.https://github.com/.extraheader 2025-12-04T14:41:21.6929004Z Entering 'third_party/kineto' 2025-12-04T14:41:21.6971144Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7014758Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T14:41:21.7056317Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7095815Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T14:41:21.7137282Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7178737Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T14:41:21.7222083Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7264641Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T14:41:21.7314284Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7361750Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T14:41:21.7404628Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7444052Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T14:41:21.7486172Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7530803Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T14:41:21.7572227Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7618110Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T14:41:21.7660399Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7703128Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T14:41:21.7745828Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7787365Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T14:41:21.7830311Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7877149Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T14:41:21.7923558Z http.https://github.com/.extraheader 2025-12-04T14:41:21.7963693Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T14:41:21.8004511Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8048646Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T14:41:21.8090746Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8144217Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T14:41:21.8191566Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8233346Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T14:41:21.8275206Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8321507Z Entering 'third_party/kleidiai' 2025-12-04T14:41:21.8368997Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8409138Z Entering 'third_party/mimalloc' 2025-12-04T14:41:21.8450730Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8492425Z Entering 'third_party/nlohmann' 2025-12-04T14:41:21.8535338Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8583084Z Entering 'third_party/onnx' 2025-12-04T14:41:21.8625680Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8680181Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T14:41:21.8722389Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8770786Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T14:41:21.8811865Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8853864Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T14:41:21.8895575Z http.https://github.com/.extraheader 2025-12-04T14:41:21.8935034Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T14:41:21.8981301Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9030743Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T14:41:21.9071842Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9112741Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T14:41:21.9152625Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9198796Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T14:41:21.9240653Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9288842Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T14:41:21.9330794Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9375445Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T14:41:21.9416102Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9454247Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T14:41:21.9495578Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9537831Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T14:41:21.9580087Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9621331Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T14:41:21.9662235Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9724807Z Entering 'third_party/pocketfft' 2025-12-04T14:41:21.9772317Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9818240Z Entering 'third_party/protobuf' 2025-12-04T14:41:21.9860677Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9901293Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T14:41:21.9942546Z http.https://github.com/.extraheader 2025-12-04T14:41:21.9986672Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T14:41:22.0030585Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0074736Z Entering 'third_party/psimd' 2025-12-04T14:41:22.0119311Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0164751Z Entering 'third_party/pthreadpool' 2025-12-04T14:41:22.0208427Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0248037Z Entering 'third_party/pybind11' 2025-12-04T14:41:22.0290500Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0337626Z Entering 'third_party/python-peachpy' 2025-12-04T14:41:22.0379936Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0419832Z Entering 'third_party/sleef' 2025-12-04T14:41:22.0459894Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0499810Z Entering 'third_party/tensorpipe' 2025-12-04T14:41:22.0541008Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0580987Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T14:41:22.0623174Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0663665Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T14:41:22.0705136Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0745662Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T14:41:22.0794329Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0839165Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T14:41:22.0880507Z http.https://github.com/.extraheader 2025-12-04T14:41:22.0916902Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T14:41:22.0962894Z http.https://github.com/.extraheader 2025-12-04T14:41:22.1030689Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.1062640Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T14:41:22.1407145Z Entering 'android/libs/fbjni' 2025-12-04T14:41:22.1434008Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T14:41:22.1454351Z Entering 'third_party/FP16' 2025-12-04T14:41:22.1488091Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T14:41:22.1507593Z Entering 'third_party/FXdiv' 2025-12-04T14:41:22.1534867Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T14:41:22.1555464Z Entering 'third_party/NNPACK' 2025-12-04T14:41:22.1588441Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T14:41:22.1608382Z Entering 'third_party/NVTX' 2025-12-04T14:41:22.1634978Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T14:41:22.1656301Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T14:41:22.1684788Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T14:41:22.1705342Z Entering 'third_party/XNNPACK' 2025-12-04T14:41:22.1733888Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T14:41:22.1768253Z Entering 'third_party/aiter' 2025-12-04T14:41:22.1796402Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T14:41:22.1816418Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T14:41:22.1845129Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T14:41:22.1873967Z Entering 'third_party/benchmark' 2025-12-04T14:41:22.1902968Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T14:41:22.1924594Z Entering 'third_party/composable_kernel' 2025-12-04T14:41:22.1953154Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T14:41:22.1981916Z Entering 'third_party/cpp-httplib' 2025-12-04T14:41:22.2014997Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T14:41:22.2035610Z Entering 'third_party/cpuinfo' 2025-12-04T14:41:22.2064081Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T14:41:22.2085134Z Entering 'third_party/cudnn_frontend' 2025-12-04T14:41:22.2113767Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T14:41:22.2135971Z Entering 'third_party/cutlass' 2025-12-04T14:41:22.2164313Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T14:41:22.2193841Z Entering 'third_party/fbgemm' 2025-12-04T14:41:22.2223347Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T14:41:22.2245415Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T14:41:22.2272697Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T14:41:22.2292701Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T14:41:22.2322102Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T14:41:22.2349487Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T14:41:22.2376745Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T14:41:22.2396218Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T14:41:22.2425202Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T14:41:22.2453548Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T14:41:22.2481445Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T14:41:22.2499986Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T14:41:22.2527170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T14:41:22.2546460Z Entering 'third_party/fbgemm/external/json' 2025-12-04T14:41:22.2574093Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T14:41:22.2600356Z Entering 'third_party/flash-attention' 2025-12-04T14:41:22.2630576Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T14:41:22.2649785Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T14:41:22.2676712Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T14:41:22.2701547Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T14:41:22.2728922Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T14:41:22.2759183Z Entering 'third_party/flatbuffers' 2025-12-04T14:41:22.2790005Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T14:41:22.2814085Z Entering 'third_party/fmt' 2025-12-04T14:41:22.2843506Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T14:41:22.2864169Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T14:41:22.2893128Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T14:41:22.2918760Z Entering 'third_party/gloo' 2025-12-04T14:41:22.2945847Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T14:41:22.2966774Z Entering 'third_party/googletest' 2025-12-04T14:41:22.2995086Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T14:41:22.3015856Z Entering 'third_party/ideep' 2025-12-04T14:41:22.3044877Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T14:41:22.3063035Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T14:41:22.3098277Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T14:41:22.3126202Z Entering 'third_party/ittapi' 2025-12-04T14:41:22.3154268Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T14:41:22.3174859Z Entering 'third_party/kineto' 2025-12-04T14:41:22.3210994Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T14:41:22.3229696Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T14:41:22.3257231Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T14:41:22.3275395Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T14:41:22.3303780Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T14:41:22.3325529Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T14:41:22.3353423Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T14:41:22.3373649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T14:41:22.3402080Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T14:41:22.3422981Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T14:41:22.3451011Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T14:41:22.3469996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T14:41:22.3497459Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T14:41:22.3520277Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T14:41:22.3546910Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T14:41:22.3566939Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T14:41:22.3594611Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T14:41:22.3615323Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T14:41:22.3651775Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T14:41:22.3672760Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T14:41:22.3701528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T14:41:22.3726174Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T14:41:22.3753411Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T14:41:22.3772383Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T14:41:22.3801848Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T14:41:22.3823008Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T14:41:22.3851133Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T14:41:22.3876705Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T14:41:22.3904910Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T14:41:22.3926304Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T14:41:22.3953432Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T14:41:22.3976460Z Entering 'third_party/kleidiai' 2025-12-04T14:41:22.4005122Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T14:41:22.4027064Z Entering 'third_party/mimalloc' 2025-12-04T14:41:22.4054546Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T14:41:22.4074410Z Entering 'third_party/nlohmann' 2025-12-04T14:41:22.4102617Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T14:41:22.4133095Z Entering 'third_party/onnx' 2025-12-04T14:41:22.4160920Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T14:41:22.4193926Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T14:41:22.4223193Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T14:41:22.4246935Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T14:41:22.4275076Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T14:41:22.4295597Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T14:41:22.4323921Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T14:41:22.4343290Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T14:41:22.4371443Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T14:41:22.4391765Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T14:41:22.4421016Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T14:41:22.4438929Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T14:41:22.4466077Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T14:41:22.4486593Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T14:41:22.4514323Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T14:41:22.4534918Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T14:41:22.4566274Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T14:41:22.4585203Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T14:41:22.4620474Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T14:41:22.4636881Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T14:41:22.4664490Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T14:41:22.4685907Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T14:41:22.4713868Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T14:41:22.4738466Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T14:41:22.4771552Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T14:41:22.4810628Z Entering 'third_party/pocketfft' 2025-12-04T14:41:22.4839017Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T14:41:22.4857742Z Entering 'third_party/protobuf' 2025-12-04T14:41:22.4886085Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T14:41:22.4907893Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T14:41:22.4935805Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T14:41:22.4954819Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T14:41:22.4990328Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T14:41:22.5012052Z Entering 'third_party/psimd' 2025-12-04T14:41:22.5040281Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T14:41:22.5059000Z Entering 'third_party/pthreadpool' 2025-12-04T14:41:22.5087024Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T14:41:22.5106666Z Entering 'third_party/pybind11' 2025-12-04T14:41:22.5135593Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T14:41:22.5156122Z Entering 'third_party/python-peachpy' 2025-12-04T14:41:22.5183767Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T14:41:22.5204233Z Entering 'third_party/sleef' 2025-12-04T14:41:22.5233125Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T14:41:22.5254034Z Entering 'third_party/tensorpipe' 2025-12-04T14:41:22.5283021Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T14:41:22.5302474Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T14:41:22.5331816Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T14:41:22.5351996Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T14:41:22.5380432Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T14:41:22.5400635Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T14:41:22.5428075Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T14:41:22.5447606Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T14:41:22.5474884Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T14:41:22.5495114Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T14:41:22.5522959Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T14:41:22.5573220Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5602058Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5628991Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5654986Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5681120Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5707333Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5734181Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5760027Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5784955Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5819522Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5837962Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5864941Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5891389Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5918161Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5942842Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5968311Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.5993043Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6019360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6044655Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6070318Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6095229Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6121771Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6147543Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6171524Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6196892Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6221524Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6246532Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6270414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6296705Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6321439Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6346253Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6372097Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6397552Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6422845Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6448402Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6473973Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6499741Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6526294Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6550681Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6577197Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6603114Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6629462Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6657418Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6683105Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6709249Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6734663Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6760956Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6791486Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6816879Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6841759Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6867501Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6892528Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6917741Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6941698Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6966997Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.6991455Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7018539Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7042804Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7067442Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7092969Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7118647Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7142595Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7169202Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7193163Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7218477Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7247991Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7272606Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7298147Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7326220Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7350891Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7374814Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7399620Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7427536Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7451993Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7477076Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7503291Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7529272Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7555043Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7580290Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7608801Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7633893Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T14:41:22.7750219Z A job completed hook has been configured by the self-hosted runner administrator 2025-12-04T14:41:22.7766218Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-12-04T14:41:22.7773043Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T14:41:22.7773330Z ##[endgroup] 2025-12-04T14:41:29.8031856Z Cleaning up orphan processes